Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2024 Dec 16;14:30448. doi: 10.1038/s41598-024-78845-x

Application of FCEEMD-TSMFDE and adaptive CatBoost in fault diagnosis of complex variable condition bearings

Min Mao 1, Bingwei Xu 2, Yuhuan Sun 3, Kairong Tan 3, Yuran Wang 3, Chao Zhou 1, Chengjiang Zhou 3,, Jingzong Yang 4,
PMCID: PMC11649821  PMID: 39681579

Abstract

The mode mixing problem and inherent mode function selection bias in Fast Ensemble Empirical Mode Decomposition (FEEMD) result in ineffective extraction of fault components during the denoising stage, the loss of coarse-grained information in Multiscale Fuzzy Dispersion Entropy (MFDE) reduces the stability of fault features, and the lack of adaptability of CatBoost hyperparameters leads to reduced diagnostic accuracy. Therefore, a complex variable operating condition fault diagnosis method based on Fast Complementary Ensemble Empirical Mode Decomposition (FCEEMD) - Time-shift Multiscale Fuzzy Dispersion Entropy (TSMFDE) and adaptive Optuna-CatBoost is proposed. We introduce paired white noise with opposite signs in the construction of FCEEMD, effectively suppressing mode aliasing by neutralizing the residual noise generated during decomposition. Then, the Maximum Information Coefficient / Gini Index was introduced to construct a composite screening strategy, retaining the Intrinsic Mode Function (IMF) components that are strongly correlated with the original signal and have a fault impact to reconstruct the denoised signal. Secondly, time-shift multiscale is introduced into the coarse-grained process, and the constructed TSMFDE effectively extracts complete and stable fault features. Finally, with the introduction of the Optuna hyperparameter optimization framework, the adaptive Optuna-CatBoost can accurately diagnose bearing faults. The average fault diagnosis accuracy of the proposed method reached 99.76% and 99.33%, indicating that FCEEMD based on white noise can quickly and accurately decompose non-aliasing vibration modes, and the composite screening strategy can further filter out irrelevant noise modes and improve signal quality; The proposed TSMFDE can extract stable fault features, and its combination with Optuna-CatBoost can further improve the accuracy of fault diagnosis. This model is expected to be applied in more fields of feature extraction and pattern recognition.

Keywords: Complex variable operating condition fault diagnosis, Fast ensemble empirical Mode Decomposition, Time-shift Multiscale fuzzy dispersion Entropy, Maximal information coefficient / Gini Index, Optuna-CatBoost

Subject terms: Engineering, Mathematics and computing, Physics

Introduction

As a key component of rotating machinery, the failure of bearings can lead to the failure of the entire system, and even cause serious safety accidents. In practical industrial applications, bearings often encounter a variety of working environments, which can change at any time. Therefore, it is essential to diagnose bearing faults promptly under complex and variable working conditions1. This has profound theoretical and practical significance for enhancing the reliability and safety of industrial equipment, reducing economic losses, and driving technological progress. The fundamental steps of bearing fault diagnosis methods include signal acquisition, preprocessing, feature extraction, and pattern recognition2.

With the increasing maturity of cloud computing and IoT technology, coupled with the continuous integration of big data in manufacturing, machine learning-based equipment fault prediction and diagnosis technology has been widely applied in the field of industrial equipment maintenance3. Hadi R et al. proposed Automated machine-learning (AutoML) techniques, which reduce the necessity for manual hyperparameter tuning and computational resources4. However, Vibration signals frequently exhibit significant background noise interference, non-stationarity, nonlinear dynamic characteristics, and the coupling modulation phenomena of multiple frequency components5. Consequently, signal preprocessing is particularly crucial It mainly includes signal decomposition and noise reduction. The commonly used methods include: Local Mean Decomposition (LMD) is an adaptive time-frequency analysis method for processing nonlinear and non-stationary signals. However, LMD has drawbacks such as high computational complexity, modal aliasing, dependence on initial parameters, and sensitivity to noise6. However, Variational Mode Decomposition(VMD) has drawbacks such as challenges with parameter selection, susceptibility to local optima, difficulty in handling extremely non-stationary or nonlinear signals, and the requirement of high computational resources7. Empirical Mode Decomposition (EMD) is an adaptive decomposition method for time series data. However, EMD has drawbacks such as endpoint effects, modal aliasing, low standardization, and sensitivity to noise8. To address the aforementioned issues, methods such as Ensemble Empirical Mode Decomposition (EEMD)9, Complementary Ensemble Empirical Mode Decomposition (CEEMD), Time-Varying Filter-based Empirical Mode Decomposition (TVF-EMD), and Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) have been proposed successively. However, EEMD cannot effectively neutralize certain white noise. Gu et al. decomposed the bearing vibration signal using CEEMD and selected the IMF with the highest correlation coefficient as the sensitive mode for envelope analysis10. However, EEMD and CEEMD, which rely on auxiliary noise, face issues such as cumbersome calculation processes and low efficiency, making them less suitable for real-time monitoring. Consequently, Wang Y H et al. proposed the Fast Ensemble Empirical Mode Decomposition (FEEMD)11, which significantly improves computational efficiency and can meet the real-time requirements of signal analysis. Sun W et al. employed FEEMD and correlation coefficients to determine an effective IMF, which has shown good performance in wind speed prediction12. Chegini S N et al. used FEEMD to decompose vibration signals. Simultaneously, the Autocorrelation Function was introduced to select the appropriate IMF and eliminate noise components, accurately detecting the moment when bearing defects occur13. In addition to utilizing correlation coefficients, effective IMF components can also be selected based on quantitative indicators such as energy, kurtosis, mutual information, residual error index14. However, it is difficult to effectively select IMF components related to abnormal vibration and shock using a single indicator screening method. Therefore, in practical applications, it is necessary to improve the efficiency of signal decomposition and minimize modal aliasing as much as possible. Additionally, it is necessary to select more effective and comprehensive evaluation indicators to effectively suppress noise.

Common fault characteristics can be analyzed and extracted from the time domain, frequency domain, and time-frequency domain. The time-domain features mainly include mean, root mean square, peak, peak to peak, etc. The frequency domain features mainly include spectrum energy, main frequency components, sidebands, etc. The time-frequency domain features mainly include Short-time Fourier Transform Spectrogram(STFT)15, wavelet transform, Hilbert-Huang transform, etc. In addition, entropy theory is a significant feature extraction method in the field of signal processing, used for quantifying the complexity and disorder of time series. Unlike other methods, the entropy method does not rely on manual experience and prior knowledge, making it easier to monitor, predict, and classify faults in rotating machinery. In recent years, Approximate Entropy(AE), Sample Entropy(SE), Fuzzy Entropy(FE), Distribution Entropy(DE), Permutation Entropy(PE), Dispersion Entropy(DisEn), and various improved versions of multi-scale entropy have been widely applied. Tan H et al. proposed the Normalized Balanced Multiscale Sample Entropy (NBMSE), which provides sensitive features for fatigue diagnosis16. Fuzzy entropy, on the other hand utilizes fuzzy functions to determine the similarity between vectors, offering more information to detect the complexity of time series compared to sample entropy17. Wang S et al. introduced a new entropy measure, termed Cumulative Spectrum Distribution Entropy (CSDEn), which demonstrates superior performance in detecting dynamic changes and measuring signal complexity. It offers the advantages of low noise sensitivity and high computational efficiency18. Ma C et al. proposed the Composite Zoom Permutation Entropy, capable of effectively extracting early faults in rotating machinery19. Li Y et al. introduced Intrinsic Multiscale Dispersion Entropy (IMDE) as a novel framework. This method can effectively extract fault features under various working conditions, achieving the highest classification accuracy20. Compared to distribution entropy and permutation entropy, dispersion entropy offers advantages such as improved noise robustness, higher resolution, and adaptability to non-stationary signals. However, it is highly sensitive to its parameters, particularly the number of classes (quantization level), and the problem of missing effective information is often attributed to insufficient consideration of signal fluctuations in dispersion entropy. In addition, traditional coarse-grained methods face the challenge of not fully leveraging the effective information in the signal as the scale increases. To address these issues, several improved multiscale entropy extraction frameworks have been proposed, including composite entropy, generalized entropy, fine composite entropy, hierarchical entropy, and Time-shifted entropy. Wang Z et al. proposed a method based on Refined Time-shifted Multiscale Fuzzy Entropy (RTSMFE), which improves upon traditional multiscale fuzzy entropy by enhancing the complexity of signal measurement. RTSMFE outperforms existing entropy techniques in complexity measurement and feature extraction21. The fuzzy membership function is combined with the empirical probability density function to propose the fuzzy distribution entropy. Ma Y et al. proposed an improved multivariate multiscale fuzzy distribution entropy, which can extract fault features from multivariate vibration signals at different speeds22. This method measures the complexity of signals across multiple scales, overcoming the limitation of a single entropy measure in fully capturing feature information. More notebly, Rostaghi M et al. innovatively proposed Fuzzy Dispersion Entropy (FDE), which is based on fuzzy membership functions and Shannon entropy theory. This algorithm demonstrates lower sensitivity to noise interference and signal length variations, outperforming traditional methods such as dispersion entropy, sample entropy, and permutation entropy23. However, as the scale of segmentation increases, the traditional coarsening process can shorten the length of the time series, which in turn reduces the stability of entropy and may result in the loss of valuable fault information.

Pattern recognition is a crucial component of complex variable working condition-bearing fault diagnosis, which fundamentally entails inputting feature information into a classifier to identify various types of faults. The classifiers commonly employed for bearing fault identification encompass traditional machine learning and deep learning methods24. Traditional machine learning classifiers encompass Support Vector Machine (SVM), Extreme Learning Machine (ELM), Decision Tree (DT), Random Forest(RF), K-nearest neighbor (K-NN), Logistic Regression(LR), and Naive Bayes(NB), among others25. Deep learning classifiers included Convolutional Neural Networks (CNN), Probabilistic Neural Network (PNN), Long Short Term Memory Networks (LSTM), Gated Recurrent Units (GRU), Autoencoders, Generative Adversarial Networks (GAN), among others26. Although deep learning methods can automatically learn feature representations and perform well in complex data analysis, there are still a series of issues, such as the need for sufficient sample data, high computational resource requirements, poor model interpretability, and overfitting risks27. Therefore, when dealing with small sample fault classification problems, machine learning methods, especially support vector machines (SVM), are still widely used. However, the performance of SVM largely depends on the selection of penalty factors C and kernel functions Inline graphic, as well as the quality and quantity of training samples28. In addition, algorithms such as CatBoost, LightGBM, and XGBoost are machine learning algorithms based on gradient boosting frameworks, all of which are implementations and optimizations of the GBDT (Gradient Boosting Decision Tree) algorithm29. The application of these methods in fault diagnosis can enhance the accuracy, generalization ability, and robustness of the model. Moreover, they can reduce the workload of data preprocessing and are capable handling imbalanced data and multi-classification problems. Lao Z et al. developed an improved Focal Loss (IFL) to enhance the model’s ability to identify similar feature samples in multi-class Light Gradient Boosting Machine(LightGBM) models30. Wang J et al. proposed Deep Support Vector Data Description (SVDD) and Conditional Wasserstein Generative Adversarial Nets (CWGAN) - Extreme Gradient Boosting (XGBoost). The trained XGBoost model can accurately diagnose faults31. A noteworthy aspect is that CatBoost stands out for its unique handling of category features; it can automatically utilize the original encoding of category features without the need for unique hot encoding, significantly reducing the risk of overfitting32. Zhou Y et al. proposed a deep feature extraction method based on Bilinear CNN, which combines the CatBoost algorithm with the fractional order method, particle swarm optimization algorithm, and ant colony optimization algorithm for predicting battery remaining capacity33. CatBoost supports parallel computing and typically requires minimal parameter tuning in practical applications. In recent years, many scholars have employed intelligent optimization algorithms to optimize the parameters of CatBoost34, such as Hunter-prey optimization (HPO)35, Sparrow Search Algorithm (SSA)29, Particle Swarm Optimization (PSO)36, Bayesian Optimization (BO)37, and others. Therefore, if its key parameters can be adaptively optimized, the classification performance of CatBoost will continue to improve.

In summary, FEEMD and CEEMD are improved versions of EMD, with FEEMD focusing on enhancing computational efficiency and CEEMD addressing the issue of mode mixing. In terms of feature extraction, FDE is a novel entropy method that has emerged in recent years. By integrating fuzzy set theory into dispersion entropy, FDE enhances its effectiveness in handling time series with uncertainty or fuzziness. More noteworthy is that FDE demonstrates strong robustness in processing time series data that contain noise and can resist noise interference to a certain extent; In terms of classification recognition, CatBoost offers the advantages of automatically handling category features, reducing overfitting, high computational efficiency, strong robustness, and ease of integration. Therefore, this article considers the organic combination of these methods for bearing fault diagnosis. However, the following shortcomings remain: (1) The effective selection of IMF-sensitive modes still results in poor suppression of vibration noise. (2) Traditional coarse-grained methods cannot fully utilize the effective information in the signal as the scale increases. (3) The classification performance of the CatBoost model is contingent upon the effective selection of key parameters. To address the shortcomings of the aforementioned methods, this paper introduces a complex variable condition bearing fault diagnosis method based on FCEEMD-TSMFDE feature extraction and adaptive Optuna-CatBoost.

The implementation process is as follows: Firstly, FCEEMD decomposition is applied to the vibration signals that are contaminated with noise and repersent different health states to obtain several IMF components. Secondly, use the Inline graphic-based composite screening method is used to quantify and filter each IMF component, followed by the completion of signal reconstruction. Next, feature extraction is carried out using TSMFDE to create a feature set bearing faults under varying conditions. Finally, this feature set is used to train the Optuna-CatBoost classifier, resulting in the successful construction of an efficient and adaptive classification model. The innovation of this model lies in employing the F1-score of CatBoost as the direct objective for Optuna optimization (i.e., the fitness function), which significationly enhances the efficiency of the optimization process. This study validates the effectiveness of the proposed method through a series of comprehensive and detailed comparative experiments. The scheme outlined in this paper is depicted in Fig. 1.

Fig. 1.

Fig. 1

Current State and Emerging Strategies for Fault Diagnosis Research.

The overall structure of this paper is as follows: Sect. 2 delves into the theories of signal decomposition, feature extraction, and classification recognition. With this Section, Sect. 2.1 offers a concise introduction to the FCEEMD theory and furnishes a detailed account of the application process of the composite screening method based on Inline graphic in Signal denoising and reconstruction. Section 2.2 offers a concise introduction to the theory of FDE, the procedure for constructing Time-shift sub-time series, and the feature extraction method based on TSMFDE. Section 2.3 provides a brief overview of the process of using Optuna to optimize CatBoost parameters. Section 3 offers a brief overview of the novel method proposed in this paper for diagnosing complex variable condition bearing faults. In Sect. 4, the superiority of our method is verified using bearing simulation signals, the XJTU and UO datasets. Additionally, the experimental process at each stage and the comparative analysis results of each method are discussed in detail. Section 5 presents the main conclusions of this experimental study.

The main innovations and contributions of this paper are summarized as follows:

  1. We propose signal decomposition, denoising, and reconstruction methods based on FCEEMD and Inline graphic. This method not only decomposes complex variable working condition vibration signals, but also employs a composite screening method (MIC and Gini index) to analyze various modal components. By setting an appropriate threshold forInline graphicand selecting components with a higher Inline graphic index, the signal contaminated by noise is effectively denoised and reconstructed.

  2. We propose a feature extraction method based on TSMFDE. This method constructs effective Time-shift sub-time series, which allows for the accurate extraction features characterizing different health states of bearings and exhibits good noise resistance. Additionally, TSMFDE has a lower algorithmic complexity, which effectively enhances the computational efficiency of classification algorithms.

  3. We build an adaptive Optuna-CatBoost fault classifier. This classifier can swiftly determine the key parameters of CatBoost according to the distribution pattern of feature samples, thereby showcasing its efficient classification performance.

Theory of fault signal processing, feature extraction, and classification recognition

Signal processing method based on FCEEMD

Inspired by CEEMD and FEEMD, the FCEEMD algorithm combines the high computational efficiency of FEEMD with the low reconstruction error and low mode aliasing of CEEMD38,39. This paper proposes a signal reconstruction method based on FCEEMD and Inline graphic screening. Remove the IMF containing noise and false components based on Inline graphic, and reconstruct the remaining IMF to obtain an effective reconstructed signal. If a group of vibration signals Inline graphic, the signal reconstruction algorithm combining FCEEMD and Inline graphic a is as follows:

  1. Initialize the added white noise amplitude m and the integration number I to set the current integration number Inline graphic.

  2. Add i pairs of white noise Inline graphic with equal amplitude and opposite sign to the original signal Inline graphic to generate two sets of noisy signals Inline graphic and Inline graphic11, as shown in Eq. (1):
    graphic file with name 41598_2024_78845_Article_Equ1.gif 1

    Where: t represents the time of signal change.

  3. Perform empirical mode decomposition on the signals Inline graphic and Inline graphic after adding white noise to obtain a series of IMF components, as shown in Eq. (2)8:
    graphic file with name 41598_2024_78845_Article_Equ2.gif 2

    Where: Inline graphic and Inline graphic represent the j-th IMF component obtained from the i-th empirical mode decomposition, and K represents the number of IMF.

  4. If the current decomposition frequency is greater than the maximum decomposition frequency (Inline graphic), increase the decomposition frequency by 1 (Inline graphic), and then cycle through (2) and (3), that is, add a white noise sequence Inline graphic with equal amplitude and opposite phase for the first time to obtain each IMF and residual R.

  5. Calculate the average IMF component obtained fromInline graphicdecompositions, as shown in Eq. (3)40.
    graphic file with name 41598_2024_78845_Article_Equ3.gif 3

    Where: Inline graphicis the j-th IMF component obtained by FCEEMD decomposition. The added white noise amplitude is 0.2, and the total number of integrations is 10040. In addition, FCEEMD has other processing programs.

  6. Based on the parameters of initializing FCEEMD, decompose (1)–(5) to obtain k IMF. This paper sets k to 8.

  7. Calculate the Gini index and MIC value for each IMF. For a segment of component signal Inline graphic, remove background noise through a bandpass filter Inline graphic to obtain Inline graphic. Where K is the number of IMF components and N is the signal length. Perform Hilbert transform on signal Inline graphic to obtain Inline graphic. The Gini index is shown in Eq. (4)41:
    graphic file with name 41598_2024_78845_Article_Equ4.gif 4

    Where: Inline graphicis an ascending sequence of Inline graphic and Inline graphic is Inline graphic.

    Meanwhile, this paper introduces Maximal Information Coefficient(MIC), whose discrete form is defined as Eq. (5)42:
    graphic file with name 41598_2024_78845_Article_Equ5.gif 5
    Where: X and Y are two random variables, Inline graphic is the joint probability of X and Y, and Inline graphic and Inline graphicare the marginal probabilities of X and Y, respectively. However, estimating the joint distribution of variables is challenging. Therefore, the MIC algorithm divides X and Y into Inline graphicand Inline graphicblocks, forming a grid. Calculate mutual information within each grid and define the maximum mutual information with a fixed grid as Eq. (6)42:
    graphic file with name 41598_2024_78845_Article_Equ6.gif 6
    where: Inline graphic represents the sample space divided by Inline graphic. The maximum mutual information of each grid size is further normalized and recorded in the feature matrix, , as shown in Eq. (7)42:
    graphic file with name 41598_2024_78845_Article_Equ7.gif 7
    MIC is defined as equation (8)42:
    graphic file with name 41598_2024_78845_Article_Equ8.gif 8
    where: Inline graphic represents the number of samples, and Inline graphic represents the upper limit of grid size. To prevent mode mixing, it is necessary to minimize the mutual information content between the decomposed modes. Simultaneously, the MIC between each decomposition mode and the original input signal should be maximized to preserve the effective information within the signal. Consequently, considering both the input signal and the modal quantities, and incorporating the Gini index, Eq. 9 is derived as a comprehensive evaluation index:
    graphic file with name 41598_2024_78845_Article_Equ9.gif 9
  8. According to the indicators obtained in Step 7, the IMF is arranged in descending order to obtain the IMF sequence based on the comprehensive evaluation indicators in descending order.

  9. Filter out IMF that meet the threshold Inline graphic, and reconstruct the first N IMF to obtain effective signals.

In summary, the process of Signal denoising based on FCEEMD is shown in Fig. 2.

Fig. 2.

Fig. 2

Schematic diagram of signal denoising and reconstruction process.

Feature extraction method based on TSMFDE

FDE is a novel encoding method designed to maintain the symbolic representation of subsequences. Multiscale Fuzzy Dispersion Entropy (MFDE) integrates FDE with multiscale theory to segment the original time series at various scales, thereby offering a more comprehensive analysis of signals. However, as the scale of segmentation increases, the traditional coarsening process can shorten the length of the time series, which in turn reduces the stability of entropy and may result in the loss of valuable fault information. Consequently, driven by the idea of Time-shift, this paper introduces TSMFDE as a replacement for the traditional coarsening process in MFDE. The calculation process of the TSMFDE of time series Inline graphicwith a length of N is as follows:

  1. Normalize the sequence x between 0 and 1. To follow the original form of Dispersion Entropy (DE), use the Normal Cumulative Distribution Function (NCDF). Time series is obtained from the NCDF of time series x, as shown in Eq. (10)23:
    graphic file with name 41598_2024_78845_Article_Equ61.gif 10

    Where: Inline graphicand Inline graphic represent the Standard Deviation (SD) and the average value of the time series x.

  2. Map time series y to sequence Inline graphic according to Eq. (11)23:
    graphic file with name 41598_2024_78845_Article_Equ70.gif 11

    Inline graphic is the i-th member of the Inline graphic series. c is a class parameter that represents the number of classes that can be assigned to Inline graphic.

  3. In DE, Inline graphic, which is closer to the integer k, belongs to the k-th class. Due to the fuzziness of the membership degree of the members of sequence Inline graphicon the boundary of two classes, the fuzzy membership function Inline graphic is considered for each class. Inline graphic represents the degree of membership of Inline graphic to k-th. Each Inline graphic is assigned to one or two classes (with an integer index Inline graphic). The conditions that member functions should meet can be found in reference43.

  4. Construct a time series Inline graphic with embedding dimension m and time delay d according to Eq. (12)23.
    graphic file with name 41598_2024_78845_Article_Equ80.gif 12
  5. Each vector Inline graphic is mapped to the dispersion mode based on its membership degree. In general, the number of dispersion patterns that attributed to each vector Inline graphic in FDE, like DE, is equal to Inline graphic.

  6. The probability of each dispersion pattern in the time series is shown in Eq. (13)23:
    graphic file with name 41598_2024_78845_Article_Equ90.gif 13

    Where: Inline graphic shows the sum of the membership degrees of the dispersion patterns Inline graphic attributed to all the series Inline graphic is divided by the total number of embedded signals with embedding dimension m.

  7. Based on the Shannon Entropy (SHE) definition, the calculation of FDE with embedding dimension m and class number c is shown in Eq. (14)23:
    graphic file with name 41598_2024_78845_Article_Equ10.gif 14
  8. Then, for a specific maximum scale factor Inline graphic, the original time series x is constructed into a coarse-grained sequence using Eq. (15)44, namely:
    graphic file with name 41598_2024_78845_Article_Equ11.gif 15

    Where: j represents the length of coarse-grained time series.

  9. For each coarse-grained factor, calculate the FDE values at each scale to obtain the entropy values across all scales. As defined above, the length of each coarse-grained sequence is equal to the length of the original time series divided by the scaling factor. Consequently, as the length of the coarse-grained sequence decreases, the deviation in entropy values will gradually increase. The estimation error also increases with the scale factor, as discussed in reference45. It is evident that in the MFDE algorithm, the length of coarse-grained sequences is overly dependent on the length of the original time series and the choice of scaling factors for the time series. When the original time series is short and the scale factor is large, the coarse-grained time series may inevitably omit a significant amount of important information.

  10. The construction process of the Time-shift subsequence for the given original time series x is shown in Fig. 3, and Inline graphic is defined as shown in Eq. (16)44:
    graphic file with name 41598_2024_78845_Article_Equ12.gif 16

    Where: Inline graphic and Inline graphic are positive integers, representing the starting and interval times of the time series, respectively. Inline graphic is a rounded integer that represents the number of upper bounds, Inline graphic.

  11. When the scale factor is Inline graphic, calculate the TSMFDE value of the coarse-grained time series. When the scale factor is Inline graphic, calculate the TSMFDE of Inline graphiccoarse-grained sequences. Due to the different starting points of each Time-shift coarse-grained sequence Inline graphic, the obtained TSMFDE will inevitably have errors.

  12. Finally, when the scale factor is Inline graphic, the obtained Inline graphicdifferent TSMFDE are averaged, i.e.:
    graphic file with name 41598_2024_78845_Article_Equ13.gif 17

    Where: m represents the embedding dimension, and Inline graphic represents the delay time.

Fig. 3.

Fig. 3

The process of constructing a Time-shifted sub-time series.

In theory, TSMFDE optimizes the time series coarsening process that is lacking in the MFDE algorithm, thereby reducing the dependence on the original time series length for the comprehensive coarsening process. Consequently, TSMFDE is theoretically superior to MFDE for feature extraction. This paper applies TSMFDE to extract features from vibration signals under variable operating conditions.

Fault classification method based on Optuna-CatBoost

CatBoost is an enhanced algorithm that builds upon the Gradient Boosting Decision Tree (GBDT) framework46. In terms of accuracy, it outperforms both XGBoost and LightGBM within the same GBDT framework. CatBoost introduces a method based on Ordered Target Statistics (OTS), which involves randomly sorting all samples to create various sets of random sequences. During training, the average label value is used to replace the categorical feature values in a given sequence. For a more detailed explanation, refer to reference33. Assuming that the i-th feature Inline graphicof the k-th sample is a categorical feature, the conversion formula is shown in Eq. (18)33. This method converts categorical features into numerical ones, thereby reducing the amount of computation and the degree of information loss.

graphic file with name 41598_2024_78845_Article_Equ14.gif 18

Where: Inline graphic is the i-th feature of the k-th sample; Inline graphic is the i-th categorical feature of the j-th sample before the k-th sample; Inline graphic is the label value of the j-th sample; Inline graphic is the dataset before the k-th sample in the random sorting; Inline graphic is the Iverson bracket, where Inline graphic is 1 when Inline graphic and Inline graphic belong to the same category (i.e., Inline graphic), and Inline graphic is 0 when they belong to different categories (i.e., Inline graphic); p is the added prior value a, which is typically a weight coefficient greater than 0.

When selecting the partition dataset, the calculation formula for all partition scores is shown in Eq. (19)33:

graphic file with name 41598_2024_78845_Article_Equ15.gif 19

Where: Inline graphic is the new score for categorical or combined feature partitions; Inline graphic is the old score for feature partitions; u is the number of features; U is the maximum value of u, indicating the number of possible feature combinations; M is the model size coefficient. The datasets partition with the best score is selected by comparing these scores.

The prerequisite for CatBoost to achieve high recognition accuracy is the selection of globally optimal parameters. However, CatBoost has numerous parameters, each serving a distinct role. In this study, Optuna is used to facilitate the adaptation of CatBoost. Optuna is an open-source Hyper-parameter Optimization (HPO) framework designed to automate the search for hyperparameters across a defined space. To identify the optimal set of hyperparameters, Optuna employs Bayesian methods. The parameter optimization process within the Optuna framework is highly modular, allowing for dynamic parameter search. Key parameters are shown in Table 1. Throughout the model training process, the Optuna framework monitors the intermediate results of the classification and terminates unpromising training iterations, thereby optimizing the efficiency of parameter selection47. Consequently, this paper utilizes Optuna to enable adaptive parameterization of the CatBoost model, ultimately determining the optimal values for the key parameters.

Table 1.

Description and setting of CatBoost Key parameters.

Parameter Description
Iterations Maximum iterations; set to [1000,1100] in this paper
Depth Tree depth, an integer not exceeding 32, typically ranging from 1 to 10; set to [4,10] in this paper
Learning rate Learning rate, the smaller the value, the more iterations required for training; set to [0.008,0.3] in this paper
Random strength Set the perturbation term for feature splitting information gain to avoid overfitting; set to [1,10] in this paper
Loss function Loss function; set to multi-class in this paper

In multi-classification problems, CatBoost employs the multi-class Log Loss function, also known as Cross-Entropy Loss. This function is appropriate for such tasks as it quantifies the discrepancy between the predicted probability distribution by the model and the actual distribution of labels. The mathematical expression for the multi-class Log Loss is provided in Eq. (20)33:

graphic file with name 41598_2024_78845_Article_Equ16.gif 20

Where: Inline graphic is the loss function, N is the number of samples, K is the number of classes, Inline graphic is the indicator variable (0 or 1) of whether the i-th sample belongs to category K, and is the Inline graphic probability predicted by the model that the i-th sample belongs to category K.

Accuracy, Macro_Precision, Macro_Recall, and F1-scores are employed to evaluate the multi-classification performance of CatBoost, as detailed in equations (21) to (24)48. Because of the inverse relationship between Macro_Precision and Macro_Recall, the F1-score is selected as a comprehensive metric for model evaluation. The F1-score represents the harmonic mean of Macro_Precision and Macro_Recall.

graphic file with name 41598_2024_78845_Article_Equ17.gif 21
graphic file with name 41598_2024_78845_Article_Equ18.gif 22
graphic file with name 41598_2024_78845_Article_Equ19.gif 23
graphic file with name 41598_2024_78845_Article_Equ20.gif 24

Among them, TP (True Positives) represents the number of positive samples correctly predicted as positive, while FN (False Negatives) represents the number of positive samples incorrectly predicted as negative. Similarly, FP (False Positives) denotes the number of negative samples incorrectly predicted as positive, and TN (True Negatives) indicates the number of negative samples correctly predicted as negative.

Bearing fault diagnosis method under variable operating conditions

To comprehensively and accurately diagnose complex bearing faults under variable operating conditions, an intelligent bearing fault diagnosis method based on FCEEMD-TSMFDE feature extraction and Optuna-CatBoost is proposed. The specific implementation process of this paper is outlined as follows:

  1. Set the parameters for FCEEMD and utilize FCEEMD to decompose the variable operating condition vibration signals of different types of faults, obtaining K IMF components.

  2. Calculate the Inline graphic index for each IMF component and arrange them in descending order. At the same time, obtain the IMF sequence Inline graphic based on Inline graphic descending order, and set the thresholdInline graphicof the vibration Signal denoising evaluation index.

  3. The IMF components with Inline graphic value greater than the threshold value Inline graphic are screened out, and these IMF components are reconstructed to obtain the noise reduction vibration signal.

  4. Set the parameters for TSMFDE and apply it to extract features from the reconstructed signal, thereby obtaining a fault feature set that represents bearings under variable operating conditions and different health states.

    The feature matrix for each state is of size 60*20.

  5. Input the fault feature set into the Optuna-CatBoost classifier for training, which achieves adaptive optimization of CatBoost’s key parameters. This process results in an optimal adaptive fault diagnosis model. Finally, the model was tested and verified on various types of faults, successfully achieving accurate identification of bearing faults,, including composite faults, under variable operating conditions.

    The superiority of the proposed method is verified through comparative analysis. Figure 4 illustrates the fault diagnosis method of complex variable operating condition bearings that is proposed in this paper.

Fig. 4.

Fig. 4

Schematic diagram of fault diagnosis under complex variable operating conditions.

Analysis of experimental results

To verify and demonstrate the practicability of the proposed method for bearing fault diagnosis under variable operating conditions, experimental analyses were conducted using simulation signals and two bearing datasets from XJTU49,50 and UO51. Additionally, to benchmark the method against others, all experiments were performed on a hardware setup featuring an Intel® Core™ i9-10900 K CPU @ 3.70 GHz with 32GB RAM, and the analyses were conducted using MATLAB R2018b (64-bit).

Experimental verification of vibration simulation signals

In the feature extraction phase, this paper employs the vibration frequency of the simulated signal for experimentation and constructs the bearing simulation signal as shown in Eq. (28).

graphic file with name 41598_2024_78845_Article_Equ21.gif 25

Where: the natural frequency Inline graphic of the bearing is 3000 Hz, the displacement constant Inline graphic is 2.5, the damping coefficient Inline graphic is 0.1, the impact fault period T is 0.00625s (characteristic frequency Inline graphic is 160 Hz), and the sampling frequency Inline graphic is 20 kHz.

The time-domain diagram of the simulated signal is shown in Fig. 5(a), with the number of data points set to 4096, and the waveform exhibits a clear periodicity. Meanwhile, -2dB of white noise is added to the simulated signal. As can be seen in Fig. 5(b), the periodicity of the waveform is somewhat obscured. Subsequently, using the FCEEMD method, the simulated signal mixed with white noise is decomposed into 8 IMF components, which consist of different frequency bands from high to low frequencies, and the decomposition results are shown in Fig. 6. This method can effectively eliminate the phenomenon of modal aliasing between the components. Among them, the periodic pulses of IMF1-IMF3 are still evident. As shown in Fig. 7, the Inline graphic index values of the aforementioned three components are relatively high, with values of 1.246, 0.4391, and 0.2717, respectively. The Inline graphic index values of the remaining components tend to stabilize and show a decreasing trend. Therefore, it can be concluded that IMF4-IMF8 contain noise components (or false components) unrelated to the fault. By setting an appropriate thresholdInline graphic, the effective IMF components are filtered out. The reconstructed simulation signal is shown in Fig. 5c. At the same time, this paper utilizes larger Inline graphic, Inline graphic and lower Inline graphic, Inline graphic, and Inline graphic to demonstrate the superior noise reduction performance of FCEEMD, as shown in equations (26) to (29). After calculation, the noise reduction index values are obtained as 0.3585, 4.9274, and 0.4470, 0.3392, 0.1998. Moreover, the decomposition time is only 0.0083 s.

graphic file with name 41598_2024_78845_Article_Equ22.gif 26
graphic file with name 41598_2024_78845_Article_Equ23.gif 27
graphic file with name 41598_2024_78845_Article_Equ24.gif 28
graphic file with name 41598_2024_78845_Article_Equ25.gif 29

Fig. 5.

Fig. 5

Simulation Signals in Different Contexts.

Fig. 6.

Fig. 6

Time-domain and frequency-domain diagrams of the first N IMF components after FCEEMD decomposition of the simulated signal.

Fig. 7.

Fig. 7

The IMF components of the simulated signal are arranged in descending order according to the Inline graphic index.

Where: Inline graphic is the noise-free signal, Inline graphic is the noise-reduced signal, Inline graphic is the mean value of the signal, and n is the number of signal sampling points.

In addition, this paper also utilizes methods such as EMD, LMD, VMD, FMD, TVF-EMD, and CEEMDAN for comparative analysis. The effect of FCEEMD is closest to that of CEEMDAN and FMD. However, it is worth noting that FCEEMD has the shortest decomposition time, significantly better than CEEMDAN and FMD. Both VMD and TVF-EMD require longer signal decomposition times. Especially VMD, which takes up to 16.594 s. Although EMD and LMD, two classic methods, have shorter decomposition times, their effectiveness is lacking. The detailed results of each indicator are shown in Table 2. Most notably, compared to the other six signal processing methods, the Inline graphic of the bearing vibration simulation signal increased by 17.62%, 10.48%, 75.48%, 12.38%, 77.39%, and 1.16%, respectively. Meanwhile, the remaining noise reduction indicators are also superior to other methods. FCEEMD has the smallest reconstruction error, the least time consumption, and the most stable IMF component. By introducing white noise with opposite signs, residual white noise in FEEMD can be completely neutralized, and mode aliasing between IMFs can be suppressed. At the same time, based on evaluation index Inline graphic, the mutual information content between IMF components and the original signal is maximized, and the effective components and interference components of the signal are well distinguished. In summary, this paper uses FCEEMD and Inline graphic to fully enhance the noise reduction efficiency of complex variable working condition vibration signals.

Table 2.

Comparison of noise reduction indicators using different signal decomposition methods.

Signal decomposition method MIC / Gini Kurtosis RMSE MAE MSE Time(s)
EMD 0.3048 4.6330 0.4869 0.3641 0.2119 0.0441
LMD 0.3245 4.6461 0.4516 0.3624 0.2040 0.2471
VMD 0.2043 3.3163 0.4669 0.3224 0.2180 16.594
FMD 0.3190 4.9059 0.4546 0.3640 0.2067 0.2293
TVF-EMD 0.2021 3.1822 0.4957 0.3578 0.2457 7.1953
CEEMDAN 0.3544 4.4673 0.4699 0.3395 0.1763 1.5158
FCEEMD 0.3585 4.9274 0.4470 0.3392 0.1998 0.0083

Experimental verification of XJTU datasets

Introduction to XJTU Experimental platform

This experiment uses the mechanical fault comprehensive simulation test bench of SQ (Spectra Quest) company to simulate the outer and inner ring faults of motor bearings. The test bench consists of three major parts: motor, rotor, and load. The piezoelectric acceleration sensors are used to collect motor bearing signals, and the data acquisition instrument is a CoCo80 with a sampling frequency of 25.6 kHz. The motor bearing model is NSK6203, and the faulty bearing is located at the motor drive end. The acceleration sensor is mounted directly above the motor drive end cover using a magnetic base. The XJTU-bearing experimental platform is shown in Fig. 8.

Fig. 8.

Fig. 8

XJTU Mechanical Fault Comprehensive Simulation Test Bench.

The experimental data consists of channel 2 vibration signals from bearings with three different damage levels: Inner ring Failure (IF), Outer ring Failure (OF), and Normal Condition (NC). These signals are used to classify the seven bearing health states mentioned earlier. Among them, the minor damage covers an area of 4 mm2 with a depth of 0.5 mm; the moderate damage covers an area of 8 mm2 with a depth of 4 mm; and the severe damage covers an area of 12 mm2 with a depth of 2 mm. The experimental data acquisition lasts for 15 s, covering a full acceleration phase from a stationary state to 3000 rpm, followed by a period of stability. Finally, the process gradually decelerates to a stop from the operating speed. In this paper, one sample is collected at intervals of 100 data points, and the sample data length is 4096. There are 60 samples for each type of failure, and a total of 420 samples for 7 types of failure. The data description is shown in Table 3, which includes details on the damage area, damage depth, fault labels (ranging from 1 to 7, correspnding to different fault IDs such as IF_1, IF_2, IF_3, NC, OF_1, OF_2, and OF_3), and the distrbution ratio of the training set (80%) and the test set (20%). Taking the operation under accelerated conditions as an example, the time-domain waveform diagram of the vibration signal is shown in Fig. 9. Given that the vibration signal is inherently nonlinear and non-stationary, it is not possible to obtain complete bearing health status information solely from the time-domain waveform signal. Therefore, the implementation of feature extraction and diagnostic models is necessary to identify the types and degrees of faults.

Table 3.

Description of 7 working states of bearing.

Health status ID Damage area(mm2) Damage depth(mm) Label Train size Test size
Mild inner ring IF_1 4 0.5 1 0.8 0.2
Moderate inner ring IF_2 8 4 2 0.8 0.2
Severe inner ring IF_3 12 2 3 0.8 0.2
Normal NC 0 0 4 0.8 0.2
Mild outer ring OF_1 4 0.5 5 0.8 0.2
Moderate outer ring OF_2 8 4 6 0.8 0.2
Severe outer ring OF_3 12 2 7 0.8 0.2
Fig. 9.

Fig. 9

Time domain waveform of vibration signals for different types of faults (taking speed increase as an example).

Signal Reconstruction and feature extraction (XJTU)

Firstly, -10dB white noise is added to all the original variable operating condition vibration signals. Subsequently, the FCEEMD method is used to perform denoising analysis on signals with different health states. Taking the inner ring fault signal IF_1 as an example, it is decomposed into several IMF components, as shown in Fig. 10. It can be seen from Fig. 10a that IMF1-IMF3 still contain obvious periodic characteristics, but IMF4-IMF8 contain noise components. Figure 10b indicates that there is no phenomenon of mode mixing between the IMF components. Subsequently, the Inline graphic values for each IMF component are calculated and then arranged in descending order. The results are shown in Fig. 11 as IMF1, IMF2, IMF3, IMF4, IMF7, IMF8, IMF6, and IMF5. For IMF1-IMF3, the Inline graphic values of the components show a decreasing trend, with values of 1.417, 0.4593, and 0.2946, respectively. The Inline graphic values of the remaining IMF components are smaller and tend to stabilize. Based on the comparative analysis in Sect. 4.2.3 of the bearing fault classification experiments, it is concluded that IMF1-IMF2 meet the threshold Inline graphic. This indicates that these components not only have a high correlation with the original signal, but also retain most of the periodic pulse components caused by fault impacts. Ultimately, IMF1-IMF2, which meet the threshold Inline graphic, are reconstructed to form effective vibration signals. Likewise, the vibration signals from the remaining six types of complex variable operating conditions, each with different health states, were subjected to noise reduction and reconstruction following the previously mentioned screening criteria. Although the various signals mentioned above have been reconstructed, the differences in amplitude and impact characteristics between them remain insufficiently significant.

Fig. 10.

Fig. 10

Time-domain and frequency-domain plots of the first N IMF components after FCEEMD decomposition of the inner ring fault signal (IF_1, -10dB).

Fig. 11.

Fig. 11

IMF components of the inner ring fault signal (IF_1) are arranged in descending order of Inline graphic index (-10dB).

To verify the superiority of the feature extraction method proposed in this paper, HFDE, RCMFDE, CMFDE, and MFDE are used for comparative analysis. Under variable operating conditions, the mean entropy distribution curves and three-dimensional scatter plots for different fault types are shown in Fig. 12. As shown in Fig. 12(a1)–(e1), compared to other methods, the advantage of TSMFDE lies in the more distinct feature differences and relatively smaller redundancy among different types of faults. As can be seen from Fig. 12(a2)–(e2), TSMFDE and HFDE outperform RCMFDE, CMFDE, and MFDE. Meanwhile, when compared to the features extracted by HFDE, TSMFDE exhibits smaller feature differences but a higher degree of aggregation. TSMFDE is conducive to fault identification. This is due to the TSMFDE’s strong temporal length robustness and superior recognition capability, endowing the extracted features with benefits such as minimal fluctuations, enhanced discriminability, and greater stability. Therefore, this article uses TSMFDE for fault feature extraction, with a dimension of 20. Finally, a fault feature vector set is constructed based on this, with a feature matrix size of 7 * 60 * 20. However, the feature extraction results of the five types of fuzzy scatter entropy all have the problem of overlapping and difficult to distinguish different state signals. This indicates that relying solely on the distribution of entropy values cannot effectively distinguish the vibration signals of different bearing faults, especially under variable operating conditions.

Fig. 12.

Fig. 12

Distribution of five types of FDEs under seven bearing health conditions (-10dB).

Fault Identification under Complex Variable operating conditions (XJTU)

To fully validate the superiority of the FCEEMD-TSMFDE method in feature extraction, an adaptive Optuna-CatBoost classifier is employed for fault diagnosis. The bearing’s different health states are identified under complex variable operating conditions. Firstly, the IMF components obtained from FCEEMD decomposition are arranged in descending order based on the comprehensive evaluation index Inline graphic. Meanwhile, the firstInline graphicIMF components are sequentially extracted to reconstruct the vibration signal, yielding 8 reconstructed signals. Finally, five methods, including TSMFDE, HFDE, RCMFDE, CMFDE, and MFDE, are used for feature extraction, and the adaptive Optuna-CatBoost classifier is used for fault identification. Analyze the vibration signal mixed with − 10dB white noise, and after 10 experiments, take the average of the results as shown in Fig. 13(a). When the first 2 IMF components are extracted to reconstruct the vibration signal, TSMFDE achieves the highest average recognition rate of 99.76%. This method resulted in only 2 misidentifications, whereas the other 8 tests achieved a 100% recognition rate. This sufficiently demonstrates that using the FCEEMD signal decomposition method and Inline graphic filtering IMF components can effectively remove noise components (or false components). When the first 3 IMF components are extracted using the remaining four feature extraction methods, the average accuracy reaches its peak. Among them, MFDE is 93.57%, CMFDE is 93.93%, RCMFDE is 93.57%, and HFDE is 96.76%. Compared to TSMFDE, these four methods exhibited decreases of 6.19%, 5.83%, 6.19%, and 3% respectively. To fully validate the effectiveness of the signal denoising and reconstruction methods proposed in this paper, the signals mixed with − 6dB, -4dB, and − 2dB white noise are analyzed using an identical process. The results are shown in Fig. 13b–d, respectively. When extracting the first 2 IMF components, the average recognition rates of TSMFDE were 99.88%, 100%, and 100%, respectively, all of which reached the highest level. The other four types of FDEs are also inferior to TSMFDE. Meanwhile, as the number of IMF components increases, their recognition rates show a trend of first increasing and then decreasing. Therefore, the superiority of the FCEEMD-TSMFDE method has been demonstrated.

Fig. 13.

Fig. 13

Bearing fault classification recognition rate obtained from different feature dimensions and feature extraction methods.

Furthermore, the bearing vibration signal, mixted with − 10dB white noise, was decomposed and filtered for noise reduction using FCEEMD and Inline graphic. After the optimal reconstruction, a more detailed analysis is conducted on a single fault classification experiment. The confusion matrix is shown in Fig. 14. As shown in Fig. 14a, the recognition rate of the Optuna-CatBoost classifier is 98.81% by using the fault features extracted by TSMFDE. Only 1 OF_1 is misidentified as IF_3, achieving the best classification effect. At this point, the Iterations of the adaptive classifier are 1042, depth is 7, learning_rate is 0.196, and random_strength is 7. The remaining results are shown in Fig. 14b–e, with 3, 5, 6, and 6 bearing health states misidentified, respectively. This corresponds to accuracies of 96.43%, 94.05%, 92.86%, and 92.86%. Based on the above analysis, it can be concluded that TSMFDE significantly outperforms HFDE, RCMFDE, CMFDE, and MFDE. Using different features for training and testing, the optimal parameter combinations of the adaptive Optuna-CatBoost classifier are shown in Table 4. Simultaneously, key parameter optimization for CatBoost is conducted using methods including BO, PSO, SSA, and HPO to obtain respective adaptive classifiers. This paper also uses Optuna to achieve parameter adaptation for LightGBM and XGBoost models separately. To fully validate the effectiveness of the proposed method in this paper, the above models are compared and analyzed with the adaptive Optuna-CatBoost. Detailed experimental comparison results are shown in Table 5. The table data reveals that when TSMFDE is combined with the Optuna-CatBoost classifier, the bearing fault classification accuracy reaches 99.76%, Macro_Precision at 99.79%, Macro_Recall at 99.76%, F1-score at 99.76%, and the model training time is 38.32 s, respectively. Compared to the model evaluation metrics and training time of the six combination methods, namely TSMFDE-Optuna-XGBoost, TSMFDE-Optuna-LightGBM, TSMFDE-HPO-CatBoost, TSMFDE-SSA-CatBoost, TSMFDE-PSO-CatBoost, and TSMFDE-BO-CatBoost, the proposed method exhibits significant advantages. Among them, the accuracy rate has been improved by 2.41%, 0.36%, 0.36%, 0.24%, 0.95%, and 0.95%, respectively. For the Macro_Precision, the improvements are 2.10%, 0.32%, 0.32%, 0.21%, 0.86%, and 0.86%, respectively. For the Macro_Recall, the improvements are 0.36%, 0.24%, 0.36%, 0.24%, 0.95%, and 0.95%, respectively. For the F1-score, the improvements are 2.38%, 0.36%, 0.36%, 0.24%, 0.95%, and 0.95%, respectively. And training time has been reduced by -23.68s, -6.09s, -3.72s, 8.78s, 312.39s, and 483.45s, respectively. Compared to the three classification methods of Optuna-XGBoost, Optuna-LightGBM, and HPO-CatBoost, although Optuna-CatBoost may be slightly inferior in model training time, its advantage in recognition rate is more prominent. CatBoost stands out as a robust gradient boosting decision tree algorithm, equipped with a multitude of hyperparameters to fine-tune its performance. Manually adjusting these parameters is time-consuming and difficult to achieve optimal configuration. Optuna is an automatic hyperparameter optimization framework that uses efficient search algorithms to determine the optimal hyperparameter combination. By using Optuna to adaptively adjust the hyperparameter combination of CatBoost, the generalization ability of the classification model can be improved and the risk of overfitting can be reduced. Similarly, after a thorough comparison of the model evaluation metrics obtained by combining HFDE, RCMFDE, CMFDE, and MFDE with various classifiers, the superiority of the adaptive Optuna-CatBoost classifier can also be concluded. Meanwhile, comparing within each classifier, the classification results of TSMFDE are also superior to the other four feature extraction methods. In addition, this paper also compared and analyzed Optuna-CatBoost with methods such as PNN, ELM, DT, and SVM (with c value of 50 and Inline graphic value of 50), and the results are shown in Fig. 15. On the one hand, regardless of the feature extraction method employed, Optuna-CatBoost surpasses other methods. On the other hand, compared to other methods, TSMFDE-Optuna-CatBoost has enhanced the average recognition rates by 13.09%, 7.97%, 2.14%, and 0.95%, respectively. Therefore, the superiority of the TSMFDE-Optuna-CatBoost method has been fully demonstrated.

Fig. 14.

Fig. 14

Fault classification confusion matrix obtained using different feature extraction methods (-10dB).

Table 4.

Optimization of CatBoost parameter values by Optuna (-10dB).

Fault characteristics Iterations Depth Learning rate Random strength
MFDE 1077 10 0.137 10
CMFDE 1096 9 0.148 5
RCMFDE 1002 4 0.159 2
HFDE 1093 7 0.163 10
TSMFDE 1042 7 0.196 7
Table 5.

Seven adaptive classification model evaluation metrics and training time ( XJTU,-10dB).

Methods of fault diagnosis Accuracy (%) Macro_Precision (%) Macro_Recall (%) F1-score training time (s)
MFDE-BO-CatBoost 94.05 94.49 94.05 94.05 705.89
CMFDE-BO-CatBoost 90.48 90.82 90.48 90.46 504.52
RCMFDE-BO-CatBoost 91.67 91.71 91.67 91.49 332.90
HFDE-BO-CatBoost 95.24 95.72 95.24 95.27 417.36
TSMFDE-BO-CatBoost 98.81 98.93 98.81 98.81 521.77
MFDE-PSO-CatBoost 92.86 93.61 92.86 92.97 332.67
CMFDE-PSO-CatBoost 90.48 90.64 90.48 90.42 787.30
RCMFDE-PSO-CatBoost 89.29 89.71 89.29 89.32 470.03
HFDE-PSO-CatBoost 95.24 95.72 95.24 95.27 623.67
TSMFDE-PSO-CatBoost 98.81 98.93 98.81 98.81 350.71
MFDE-SSA-CatBoost 92.03 92.55 92.03 92.03 43.72
CMFDE-SSA-CatBoost 84.76 85.25 84.76 84.73 35.03
RCMFDE-SSA-CatBoost 85.47 85.65 85.47 85.44 47.03
HFDE-SSA-CatBoost 94.76 95.48 94.76 94.80 39.33
TSMFDE-SSA–CatBoost 99.52 99.57 99.52 99.52 47.10
MFDE-HPO-CatBoost 91.91 92.38 91.91 91.90 23.01
CMFDE-HPO-CatBoost 84.52 84.98 84.52 84.48 15.54
RCMFDE-HPO-CatBoost 84.76 84.33 84.76 84.69 19.22
HFDE-HPO-CatBoost 94.88 95.57 94.88 94.92 21.70
TSMFDE-HPO-CatBoost 99.41 99.47 99.41 99.41 34.60
MFDE-Optuna-LightGBM 92.86 93.15 92.86 92.62 87.79
CMFDE-Optuna-LightGBM 86.90 88.04 86.90 87.07 44.15
RCMFDE-Optuna-LightGBM 85.71 85.86 85.71 85.55 62.49
HFDE-Optuna-LightGBM 96.43 96.54 96.43 96.39 65.47
TSMFDE-Optuna-LightGBM 99.41 99.47 99.41 99.40 32.23
MFDE-Optuna-XGBoost 88.10 89.25 88.10 88.27 11.36
CMFDE-Optuna-XGBoost 84.52 85.36 84.52 84.36 23.41
RCMFDE-Optuna-XGBoost 88.10 88.32 88.10 88.10 14.58
HFDE-Optuna-XGBoost 94.05 94.76 94.05 94.10 11.66
TSMFDE-Optuna-XGBoost 97.38 97.69 97.38 97.34 14.64
MFDE-Optuna-CatBoost 93.57 93.93 93.57 93.56 49.31
CMFDE-Optuna-CatBoost 93.93 94.34 93.93 93.87 49.48
RCMFDE-Optuna-CatBoost 93.57 93.75 93.57 93.51 44.38
HFDE-Optuna-CatBoost 96.79 97.07 96.79 96.81 34.66
TSMFDE-Optuna-CatBoost 99.76 99.79 99.76 99.76 38.32
Fig. 15.

Fig. 15

Comparison with Other Traditional Classification Methods ( -10dB).

In summary, the method based on FCEEMD and Inline graphic in this paper has good Signal denoising performance. TSMFDE can effectively extract the fault features of complex variable working condition bearings. The fault classification performance of the adaptive Optuna-CatBoost classifier is significant.

Experimental verification of UO datasets

Introduction to the UO experimental platform

To further validate the effectiveness of the proposed method, we conducted fault diagnosis experiments on the bearing vibration datasets provided by the University of Ottawa in Canada. These datasets were collected under complex variable operating conditions. This experiment was conducted on Spectra Quest Mechanical Fault Simulator (MFS-PK5M). The test bench consists of an electric motor, AC driver, encoder (model 775), coupling, rotor, and bearings. The shaft is driven by an electric motor and the speed is controlled by a current driver. The experimental setup is shown in Fig. 16. The specific models of bearings are shown in Table 6, whereInline graphicis the rotational frequency.

Fig. 16.

Fig. 16

Experimental setup for bearings at the University of Ottawa.

Table 6.

Bearing model.

Model Pitch diameter (mm) Ball diameter (mm) Number of balls BPFI BPFO
ER16K 38.52 7.94 9 5.43fr 3.57fr

The experimental data contains vibration signals and speed signals of the healthy state of bearings under time-varying speed conditions. Among them, the vibration data is the Channel_1 signal measured by the accelerometer. The speed data is the Channel_2 signal measured by the encoder. The health status of bearings includes 5 categories: health, inner ring failure, outer ring failure, ball failure, and composite failure. The time-varying speed conditions comprise four types: acceleration, deceleration, acceleration followed by deceleration, and deceleration followed by acceleration. The sampling frequency of the vibration signal was set to 200 kHz, and the sampling duration is 10 s. This article sequentially collects vibration signal samples for each fault type, with a total of 60 samples per type, each consisting of 4096 data points. Of these samples, 80% were randomly assigned to the training set, while the remaining 20% were used for testing. This partitioning was used to conduct a total of 10 experiments on bearing fault classification and recognition. This paper selects vibration signals collected under acceleration conditions to classify and identify bearing faults under variable operating conditions. Similarly, according to the sample collection method in Sect. 4.2, there are a total of 300 samples for 5 types of health status. The data description is shown in Table 7. It includes various types of faults, fault IDs (including BA1, CA1, HA1, IA1, and OA1, among them, A represents acceleration), fault labels (corresponding 1 to 5 based on fault ID), and the proportion of training and testing sets. The time-domain waveform of the vibration signal is shown in Fig. 17. Through subsequent steps of signal reconstruction, feature extraction, and classification, the fault diagnosis of bearings under complex variable working conditions is achieved.

Table 7.

Description of 5 working states of bearing.

Fault type ID Label Train size Test size
Ball malfunction BA1 1 0.8 0.2
Composite fault CA1 2 0.8 0.2
healthy HA1 3 0.8 0.2
Inner ring failure IA1 4 0.8 0.2
Outer ring failure OA1 5 0.8 0.2
Fig. 17.

Fig. 17

Time domain waveform of vibration signals for different types of faults (taking acceleration as an example).

Signal reconstruction and feature extraction (UO)

Similarly, white noise level of -10dB was added to the vibration signals of five different bearing health states under complex variable conditions. Subsequently, the signal mixed with noise is subjected to FCEEMD decomposition. Taking the vibration signal of a composite fault bearing as an example, the time-domain diagrams of each component are shown in Fig. 18(a). From the graph, it can be intuitively seen that IMF1-IMF4 have obvious periodic characteristics. Calculating the indicators of the above components, the ranking also shows a clear decreasing trend. Calculating the Inline graphic index of the above components also shows a clear decreasing trend after sorting. Their values are 1.361, 0.4581, 0.3355, and 0.24, respectively, as shown in Fig. 19. However, IMF5-IMF8 contain chaotic noise components. Their Inline graphic index tends to be stable, and the critical value can be set as a threshold Inline graphic. As shown in the frequency domain diagram in Fig. 18(b), no modal aliasing phenomenon is observed between the components as the frequency transitions from high to low. For this purpose, the first 4 IMF components can be screened to reconstruct the vibration signal, thereby achieving noise reduction.

Fig. 18.

Fig. 18

The first N IMF components after the compound fault vibration signal FCEEMD decomposition (Composite Fault,-10dB).

Fig. 19.

Fig. 19

IMF components of composite fault vibration signal arranged in descending order by Inline graphic index (-10dB).

Using the above analysis methods, decompose, screen, and reconstruct vibration signals of each category. Affected by noise, it is impossible to identify the type of bearing fault solely by relying on the differences in amplitude and impact characteristics between the signals. For this, TSMFDE is used for feature extraction of vibration signals. The mean distribution curves of different entropy for different types of faults under variable operating conditions are shown in Fig. 20(a1)–(e1). In line with the experimental analysis conducted by XJTU, the advantage of TSMFDE lies in the significant differences in characteristics between different types of faults, as well as its relatively low degree of redundancy. At the same time, it can be seen from the three-dimensional scatter plot in Fig. 20(a2) that the difference in feature values of the health status of different types of bearings is relatively small. Although the bearing vibration signal mixed with − 10dB white noise has undergone noise reduction and reconstruction, this process also preliminarily indicates the significant classification efficacy of the adaptive CatBoost model proposed in this paper. However, the degree of feature aggregation between different categories is relatively high, and there is less overlap between categories. Meanwhile, using HFDE, RCMFDE, CMFDE, and MFDE for comparative analysis, the distribution of characteristic values is shown in Fig. 20(b2)–(e2), respectively. It is worth noting that there is little difference in feature values between different categories and some features are too clustered. Especially when using HFDE to extract features, only IA1 fault features show slight differences. Overall, compared to the other four feature extraction results, TSMFDE still has relatively small feature fluctuations, high discriminability, and greater stability (compared to XJTU, UO has a more significant effect). This further confirms that TSMFDE has good temporal length robustness and recognition ability. Therefore, this study employs TSMFDE to extract fault features of complex variable condition bearings. Finally, the size of the feature matrix is 5 * 60 * 20.

Fig. 20.

Fig. 20

Distribution of entropy values for five types of FDEs under healthy conditions of five types of bearings (-10dB).

Complex variable condition fault identification (UO)

Firstly, sort the N IMF components obtained from FCEEMD decomposition in descending order based on the Inline graphic value. Reconstruct the vibration signal mixed with − 10dB white noise by sequentially selecting the first n components (n takes Inline graphic). Ultimately, the adaptive Optuna-CatBoost classifier is employed for classification recognition, fully validating the effectiveness of the FCEEMD-TSMFDE feature extraction method. The classification results are shown in Fig. 21(a). From the graph, it can be seen that when Inline graphic, the recognition rate increases with the increase of IMF components. This indicates that IMF1-IMF4 have a strong correlation with the original signal. When Inline graphic, the recognition rate shows a significant downward trend (or fluctuation). This indicates that IMF5-IMF8 are false components (or noise components). This is consistent with the conclusion in Sect. 4.2.3, that selecting the first 4 IMF components to reconstruct the bearing vibration signal can effectively achieve noise reduction. Among them, the average recognition rate using TSMFDE as a classification feature is 99.33%, reaching the highest level. This sample contains 4 instances of misidentification, while the remaining 6 instances achieved a 100% recognition rate. The remaining four types of FDEs are 85.67%, 94%, 94%, and 93.67%, respectively. Compared to TSMFDE, the above four methods reduced by 5.66%, 5.33%, 5.33%, and 13.66%, respectively. For the bearing vibration signal mixed with − 6dB white noise, the experimental results are shown in Fig. 21(b). When extracting the first 1 IMF component, the average recognition rate of TSMFDE reaches 100%. Compared with extracting the first 8 IMF components, the recognition rate increased by 0.33%, and TSMFDE outperformed other feature extraction methods. The fault identification results of the remaining two signals (with − 4dB and − 2dB white noise) are shown in Fig. 21 (c) and 21 (d). When extracting the first 2 IMF components, the average recognition rate of TSMFDE reaches its optimal level. The other four types of FDEs are also inferior to TSMFDE. Meanwhile, as the number of IMF components increases, the recognition rate also shows a trend of first increasing and then decreasing. Therefore, the superiority of the FCEEMD-TSMFDE method has been once again demonstrated.

Fig. 21.

Fig. 21

Classification recognition rate of bearing faults obtained from different feature dimensions and different types of FDE.

A detailed analysis was also conducted on the single classification results of bearing fault vibration signals mixed with − 10dB white noise. Figure 22(a) presents the confusion matrix for TSMFDE, indicating that only one instance of OA1 (label 5) was incorrectly identified as CA1 (label 2), resulting in a recognition rate of 98.33%. Table 8 shows the optimal parameter combination for the adaptive Optuna-CatBoost classifier, which uses various types of FDEs as fault features during training and testing. At this point, the Iterations of the adaptive Optuna-CatBoost classifier are 1073, Depth is 7, Learning_rate is 0.211, and Random_strength is 4. By utilizing the classifier’s parameter adaptability, optimal identification of various faults types in bearings under complex variable working conditions, thereby achieving effective warning and diagnosis. Figure 22(b) shows that, using HFDE as the fault features, 9 samples were misidentified across 4 fault types. Among them, only the faults labeled IA1 and HA1 were correctly identified, with the feature distribution corresponding to the classification recognition results. Figure 22(c) illustrstes that RCMFDE misidentified 3 samples across 2 types. Figure 22d and e show that both CMFDE and MFDE misidentified 4 samples across 3 types. TSMFDE showed an increase in classification and recognition rates of 13.33%, 3.33%, 5%, and 5% compared to HFDE, RCMFDE, CMFDE, and MFDE, respectively. At the same time, the other six adaptive CatBoost classification methods were compared with Optuna-CatBoost again to verify their superiority. The experimental comparison results are shown in Table 9. The table data reveals that the TSMFDE and Optuna-CatBoost combination achieved a bearing fault classification accuracy of 99.33%, with Macro Precision at 99.39%, Macro Recall at 99.33%, an F1-score of 99.34%, and a model training time of 39.33 s. Compared to the model evaluation metrics and training time of TSMFDE-Optuna-XGBoost, TSMFDE-Optuna-LightGBM, TSMFDE-HPO-CatBoost, TSMFDE-SSA-CatBoost, TSMFDE-PSO-CatBoost, and TSMFDE-BO-CatBoost, our method still exhibits outstanding advantages. It is worth noting that the results from the UO datasets are more significant compared to those of the XJTU datasets.

Fig. 22.

Fig. 22

Fault classification confusion matrix obtained using different types of multiscale FDEs (-10dB).

Table 8.

Optuna optimized CatBoost parameter value results (-10dB).

Fault characteristics Iterations Depth Learning rate Random strength
MFDE 1078 6 0.134 4
CMFDE 1097 10 0.198 5
RCMFDE 1081 4 0.168 5
HFDE 1013 10 0.078 3
TSMFDE 1073 7 0.211 4
Table 9.

Seven adaptive classification model evaluation metrics and training time (UO, -10dB).

Methods of fault diagnosis Accuracy (%) Macro_Precision (%) Macro_Recall (%) F1-score training time (s)
MFDE-BO-CatBoost 95.00 96.25 95.00 95.21 384.66
CMFDE-BO-CatBoost 95.00 95.44 95.00 95.02 227.18
RCMFDE-BO-CatBoost 95.00 95.44 95.00 95.02 464.92
HFDE-BO-CatBoost 85.00 87.31 85.00 84.73 184.13
TSMFDE-BO-CatBoost 96.67 97.04 96.67 96.69 315.96
MFDE-PSO-CatBoost 95.00 96.25 95.00 95.21 523.97
CMFDE-PSO-CatBoost 93.33 94.42 93.33 93.46 588.20
RCMFDE-PSO-CatBoost 93.33 94.42 96.33 93.46 389.56
HFDE-PSO-CatBoost 85.00 89.90 85.00 84.50 813.80
TSMFDE-PSO-CatBoost 98.33 98.50 98.33 98.35 375.60
MFDE-SSA-CatBoost 95.00 96.25 95.00 95.21 46.47
CMFDE-SSA-CatBoost 93.33 94.42 93.33 93.46 43.43
RCMFDE-SSA-CatBoost 93.33 94.42 93.33 93.46 46.34
HFDE-SSA-CatBoost 86.67 90.47 86.67 86.36 42.68
TSMFDE-SSA-CatBoost 97.67 97.78 97.67 97.68 36.76
MFDE-HPO-CatBoost 95.00 96.25 95.00 95.21 16.96
CMFDE-HPO-CatBoost 93.33 94.42 93.33 93.46 19.19
RCMFDE-HPO-CatBoost 93.33 94.42 93.33 93.46 21.30
HFDE-HPO-CatBoost 86.67 90.47 86.67 86.36 19.65
TSMFDE-HPO-CatBoost 96.67 96.83 96.67 96.69 13.07
MFDE-Optuna-LightGBM 93.33 94.24 93.33 93.28 36.70
CMFDE-Optuna-LightGBM 93.33 94.42 93.33 93.46 23.81
RCMFDE-Optuna-LightGBM 93.33 94.42 93.33 93.46 21.33
HFDE-Optuna-LightGBM 88.33 88.09 88.33 88.06 34.22
TSMFDE-Optuna-LightGBM 97.00 97.30 97.00 96.99 25.99
MFDE-Optuna-XGBoost 93.33 94.68 93.33 93.43 15.18
CMFDE-Optuna-XGBoost 91.67 92.92 91.67 91.77 14.44
RCMFDE-Optuna-XGBoost 91.67 92.96 91.67 91.70 5.51
HFDE-Optuna-XGBoost 88.33 91.13 88.33 88.15 10.07
TSMFDE-Optuna-XGBoost 94.17 94.68 94.17 94.20 13.49
MFDE-Optuna-CatBoost 93.67 95.17 93.67 93.84 45.26
CMFDE-Optuna-CatBoost 94.00 94.83 94.00 94.08 30.06
RCMFDE-Optuna-CatBoost 94.00 94.83 94.00 94.08 61.58
HFDE-Optuna-CatBoost 85.67 89.57 85.67 85.38 33.93
TSMFDE-Optuna-CatBoost 99.33 99.39 99.33 99.34 39.33

Among them, the comparison accuracy has been improved by 5.17%, 2.33%, 2.66%, 1.66%, 1.00%, and 2.66%, respectively. Compared to Macro Precision, it has increased by 4.72%, 2.10%, 2.56%, 1.61%, 0.89%, and 2.35%, respectively. Compared to Macro Recall, it has increased by 5.17%, 2.33%, 2.66%, 1.66%, 1.00%, and 2.66%, respectively. Compared to F1-score, they have increased by 5.13%, 2.34%, 2.65%, 1.65%, 0.99%, and 2.65%, respectively. Compared to the training time, it was reduced by -25.84s, -13.34s, -26.26s, -2.57s, 336.27s, and 276.63s, respectively. Compared to Optuna-XGBoost, Optuna-LightGBM, HPO-CatBoost, and SSA-CatBoost classifiers, although Optuna-CatBoost slightly lags in model training time, its advantage in recognition rate is still more prominent. Furthermore, the model evaluation indicators resulting from the sequential combination of HFDE, RCMFDE, CMFDE, and MFDE with various classification methods continue to confirm the superiority of the adaptive Optuna CatBoost classification method. Meanwhile, when examining the classification results within each classifier, TSMFDE outperforms the other four feature extraction methods. This method was also compared and analyzed against classical methods, such as PNN, ELM, DT, and SVM. The results are shown in Fig. 23. No matter which feature extraction method is used, Optuna-CatBoost remains superior to other methods. Compared to other methods, TSMFDE-Optuna-CatBoost also enhanced the average recognition rates by 18.83%, 32.16%, 5.66%, and 2%, respectively. Therefore, the superiority of the TSMFDE-Optuna-CatBoost method has been once again demonstrated.

Fig. 23.

Fig. 23

Comparison with Other Traditional Classification Methods (-10dB).

In summary, the proposed method in this paper has been rigorously compared through detailed experiments, confirming its superiority.

Conclusion

This paper proposes a fault diagnosis method for complex variable operating conditions. The method is based on FCEEMD signal processing, TSMFDE feature extraction, and Optuna-CatBoost classification. This method integrates the benefits of FCEEMD for decomposing variable-speed vibration signals, Inline graphic for evaluating signal denoising and reconstruction, TSMFDE for feature extraction, and Optuna-CatBoost for classification and recognition. This paper conducts fault diagnosis under complex variable operating conditions, and the experimental results confirm the effectiveness and progressiveness of the proposed method compared over existing method. Meanwhile, compared to deep learning methods, it only requires fewer feature samples to complete the classification and recognition of different health states of bearings. The main conclusions of this paper are as follows:

  1. The proposed method, which involves decomposition, noise reduction, and reconstruction of vibration signals under complex variable working conditions, can effectively decompose IMF components from high-low frequencies using FCEEMD. It effectively suppresses the modal aliasing phenomenon. On the other hand, the composite screening method based on Inline graphic can effectively filter out false components/noise components from vibration signals, thus achieving noise reduction in complex variable speed-bearing vibration signals. Compared to EMD, LMD, VMD, FMD, TVF-EMD, and CEEMDAN, the Inline graphic of vibration simulation signals increased by 17.62%, 10.48%, 75.48%, 12.38%, 77.39%, and 1.16%, respectively. Additionally, it can effectively obtain the optimal reconstruction signal.

  2. By effectively constructing a Time-shift sub-time series, the insufficient time series coarsening process in the MFDE algorithm was optimized. Based on the characteristics of TSMFDE, different health states of bearings can be effectively characterized in complex variable working environments. Compared to the four FDE of MFDE, CMFDE, RCMFDE, and HFDE, this method achieves the highest classification accuracy. Using Optuna-CatBoost for fault diagnosis on XJTU and UO datasets, the recognition rates increased by 6.19%, 5.83%, 6.19%, 3%, and 5.66%, 5.33%, 5.33%, and 13.66%, respectively. Moreover, the distribution of TSMFDE can not only effectively avoid feature loss, but also improves the computational efficiency of classification methods.

  3. The proposed Optuna-CatBoost classification method can adaptively select key hyperparameters such as Iterations, Depth, Learning_rate, and Random_strength, etc., based on the distribution pattern of feature samples. Through fault diagnosis experiments of complex variable working conditions bearings (two publicly available datasets), this method can accurately identify various health states, including composite faults, with recognition rates of 99.76% and 99.33%, respectively. Compared with ten classification methods including BO-CatBoost, PSO-CatBoost, SSA-CatBoost, HPO-CatBoost, Optuna-LightGBM, Optuna-XGBoost, PNN, ELM, DT, SVM, etc., its recognition rate has been improved by 0.95%, 0.95%, 0.24%, 0.35%, 0.35%, 2.38%, 13.09%, 7.97%, 2.14%, 0.95% and 2.66%, 1%, 1.66%, 2.66%, 2.33%, 5.16%, 18.83%, 32.16%, 5.66%, 2%, respectively. The above comparative experiments fully demonstrate the most efficient classification performance of our method.

However, the method presented in this paper is specifically tailored for analyzing bearing faults, and its transferability to other applications requires further improvement and optimization. In subsequent research, we plan to apply this method to the fault diagnosis of other critical equipment, such as gearboxes, transformers, and wind turbines. We will also enhance comparative studies to ascertain the effectiveness and universality of the signal reconstruction, feature extraction, and adaptive classification methods proposed in this paper. In addition, we will continue to explore and apply the most advanced technologies or theories to solve more complex and challenging variable condition fault diagnosis problems.

Acknowledgements

The authors sincerely thanks the team for their guidance, and thanks Xi’an Jiaotong University and University of the Ottawa for their bearing datasets and check valve datasets. The author sincerely expresses thanks to the reviewers for taking the time to review the paper in a busy schedule.

Author contributions

Conceptualization, M.M., and CJ.Z.; Methodology, M.M.; Software, M.M.; Validation, B.X., C.Z. and J.Y.; Formal analysis, M.M.; Investigation, M.M.; Resources, CJ.Z.; Data curation, Y.S., K.T., and Y.W.; Writing—original draft, M.M., Y.S., K.T., and Y.W.; Writing—review and editing, CJ.Z., B.X., and J.Y.; Visualization, M.M., C.Z. and Y.S.; Supervision, B.X., C.Z. and J.Y.; Project administration, M.M., CJ.Z., and J.Y.; Funding acquisition, M.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the General Scientific Research Project Funding from Zhejiang Provincial Department of Education (No. Y202353293), the Project of Quzhou Science and Technology Plan (No. 2021K31), the National Natural Science Foundation of China (No. 62363036), the PhD research startup foundation of Yunnan Normal University (No.01000205020503131), Fundamental Research Program of Yunnan Province (No. 202201AU070055), Yunnan Fundamental Research Projects (No.202301AT070256), Baoshan Xingbao Young Talent Training Project (No.202303), The 10th batches of Baoshan young and middle-aged leaders training project in academic and technical(202109).

Data availability

The data used to support the findings of this study are available from the corresponding author upon request.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Chengjiang Zhou, Email: chengjiangzhou@foxmail.com.

Jingzong Yang, Email: yjingzong@foxmail.com.

References

  • 1.Y An K Zhang Y Chai Q Liu X Huang 2023 Domain adaptation network base on contrastive learning for bearings fault diagnosis under variable working conditions Expert Syst. Appl. 212 118802 [Google Scholar]
  • 2.Mian T, Choudhary A, Fatima S. Multi-sensor fault diagnosis for misalignment and unbalance detection using machine learning. IEEE Trans. Ind. Appl. (2023).
  • 3.Wang H, Zheng J, Xiang J. Online bearing fault diagnosis using numerical simulation models and machine learning classifications[J]. Reliability Engineering & System Safety, 2023, 234: 109142. [Google Scholar]
  • 4.Tang H, Tang Y, Su Y, Feng W, Wang B, Chen P, Zuo D. Feature extraction of multi-sensors for early bearing fault diagnosis using deep learning based on minimum unscented kalman filter[J]. Engineering Applications of Artificial Intelligence, 2024, 127: 107138. [Google Scholar]
  • 5.Yang J, Wang Z, Guo Y, Gong T, Shan Z. A novel noise-aided fault feature extraction using stochastic resonance in a nonlinear system and its application. IEEE Sensors J. (2024).
  • 6.Q Song X Jiang G Du J Liu Z Zhu 2023 Smart multichannel mode extraction for enhanced bearing fault diagnosis Mech. Syst. Signal Process. 189 110107 [Google Scholar]
  • 7.G Li H Deng H Yang 2023 Traffic flow prediction model based on improved variational mode decomposition and error correction Alex. Eng. J. 76 361 389 [Google Scholar]
  • 8.L Yao J Zheng Y Xiao G Zhang L Zhang 2023 An intelligent fault diagnosis method for lithium-ion battery pack based on empirical mode decomposition and convolutional neural network J. Energy Stor. 72 108181 [Google Scholar]
  • 9.Li D, Jiang M R, Li M W, Hong W C, Xu R Z. A floating offshore platform motion forecasting approach based on EEMD hybrid ConvLSTM and chaotic quantum ALO[J]. Applied Soft Computing, 2023, 144: 110487. [Google Scholar]
  • 10.J Gu Y Peng 2021 An improved complementary ensemble empirical mode decomposition method and its application in rolling bearing fault diagnosis Digit. Signal Process. 113 103050 [Google Scholar]
  • 11.YH Wang CH Yeh HWV Young K Hu MT Lo 2014 On the computational complexity of the empirical mode decomposition algorithm Phys. A Stat. Mech. Appl. 400 159 167 [Google Scholar]
  • 12.N Sun J Zhou G Liu Z He 2019 A hybrid wind speed forecasting model based on a decomposition method and an improved regularized extreme learning machine Energy Procedia 158 217 222 [Google Scholar]
  • 13.SN Chegini MJH Manjili A Bagheri 2020 New fault diagnosis approaches for detecting the bearing slight degradation Meccanica 55 1 261 286 [Google Scholar]
  • 14.M Mao C Zhou B Xu D Liao J Yang S Liu Y Li T Tang 2024 Fault diagnosis method using MVMD signal reconstruction and MMDE-GNDO feature extraction and MPA-SVM Front. Phys. 12 1301035 [Google Scholar]
  • 15.W Ying Y Li K Noman J Zheng D Wang K Feng Z Li 2025 Stockwell transform spectral amplitude modulation method for rotating machinery fault diagnosis Mech. Syst. Signal Process. 223 111884 [Google Scholar]
  • 16.H Tan S Xie R Liu J Cheng K Jing 2023 Fatigue condition diagnosis of rolling bearing based on normalized balanced multiscale sample entropy Int. J. Fatigue 172 107642 [Google Scholar]
  • 17.W Chen Z Wang H Xie W Yu 2007 Characterization of surface EMG signal based on fuzzy entropy IEEE Trans. Neural Syst. Rehab. Eng. 15 2 266 272 [DOI] [PubMed] [Google Scholar]
  • 18.S Wang Y Li K Noman D Wang K Feng Z Liu Z Deng 2024 Cumulative spectrum distribution entropy for rotating machinery fault diagnosis Mech. Syst. Signal Process. 206 110905 [Google Scholar]
  • 19.C Ma Y Li X Wang Z Cai 2023 Early fault diagnosis of rotating machinery based on composite zoom permutation entropy Reliab. Eng. Syst. Saf. 230 108967 [Google Scholar]
  • 20.Y Li Z Guo Z Li Z Deng K Noman 2023 Instantaneous angular speed-based fault diagnosis of multicylinder marine diesel engine using intrinsic multiscale dispersion entropy IEEE Sensors J. 23 9 9523 9535 [Google Scholar]
  • 21.Z Wang G Li L Yao Y Cai T Lin J Zhang H Dong 2023 Intelligent fault detection scheme for constant-speed wind turbines based on improved multiscale fuzzy entropy and adaptive chaotic Aquila optimization-based support vector machine ISA Trans. 138 582 602 [DOI] [PubMed] [Google Scholar]
  • 22.Y Ma J Cheng P Wang J Wang Y Yang 2023 A new rotating machinery fault diagnosis method for different speeds based on improved multivariate multiscale fuzzy distribution entropy Nonlinear Dyn. 111 18 16895 16919 [Google Scholar]
  • 23.M Rostaghi MM Khatibi MR Ashory H Azami 2021 Fuzzy dispersion entropy: A nonlinear measure for signal analysis IEEE Trans. Fuzzy Syst. 30 9 3785 3796 [Google Scholar]
  • 24.M Yu Y Zhang C Yang 2023 Rolling bearing faults identification based on multiscale singular value Adv. Eng. Inform. 57 102040 [Google Scholar]
  • 25.H Wang J Zheng J Xiang 2023 Online bearing fault diagnosis using numerical simulation models and machine learning classifications Reliab. Eng. Syst. Saf. 234 109142 [Google Scholar]
  • 26.D Ruan J Wang J Yan C Gühmann 2023 CNN parameter design based on fault signal analysis and its application in bearing fault diagnosis Adv. Eng. Inform. 55 101877 [Google Scholar]
  • 27.Z Zhu Y Lei G Qi Y Chai N Mazur Y An X Huang 2023 A review of the application of deep learning in intelligent fault diagnosis of rotating machinery Measurement 206 112346 [Google Scholar]
  • 28.S Ding M Hao Z Cui Y Wang J Hang X Li 2023 Application of multi-SVM classifier and hybrid GSAPSO algorithm for fault diagnosis of electrical machine drive system ISA Trans. 133 529 538 [DOI] [PubMed] [Google Scholar]
  • 29.Wei X, Rao C, Xiao X, Chen L, Goh M. Risk assessment of cardiovascular disease based on SOLSSA-CatBoost model[J]. Expert Systems with Applications, 2023, 219: 119648. [Google Scholar]
  • 30.Z Lao D He Z Wei S Hui Z Jin J Miao C Ren 2023 Intelligent fault diagnosis for rail transit switch machine based on adaptive feature selection and improved LightGBM Eng. Fail. Anal. 148 107219 [Google Scholar]
  • 31.J Wang X Jin Y Lyu Z Jia 2024 A novel quality inspection method of compressors based on Deep SVDD and CWGAN-XGBoost Int. J. Refrig. 157 159 171 [Google Scholar]
  • 32.B Dhananjay J Sivaraman 2021 Analysis and classification of heart rate using CatBoost feature ranking model Biomed. Signal Process. Control 68 102610 [Google Scholar]
  • 33.Y Zhou S Wang Y Xie J Zeng C Fernandez 2024 Remaining useful life prediction and state of health diagnosis of lithium-ion batteries with multiscale health features based on optimized CatBoost algorithm Energy 300 131575 [Google Scholar]
  • 34.Y Qiu J Zhou M Khandelwal H Yang P Yang C Li 2022 Performance evaluation of hybrid WOA-XGBoost, GWO-XGBoost and BO-XGBoost models to predict blast-induced ground vibration Eng. Comput. 38 Suppl 5 4145 4162 [Google Scholar]
  • 35.I Naruei F Keynia MA Sabbagh 2022 Hunter–prey optimization: Algorithm and applications Soft Comput. 26 3 1279 1314 [Google Scholar]
  • 36.S Demir EK Sahin 2023 Predicting occurrence of liquefaction-induced lateral spreading using gradient boosting algorithms integrated with particle swarm optimization: PSO-XGBoost, PSO-LightGBM, and PSO-CatBoost Acta Geotechnica 18 6 3403 3419 [Google Scholar]
  • 37.J Nayak B Naik PB Dash S Vimal S Kadry 2022 Hybrid Bayesian optimization hypertuned CatBoost approach for malicious access and anomaly detection in IoT nomalyframework Sustain. Comput. Inform. Syst. 36 100805 [Google Scholar]
  • 38.S Gao T Li Y Zhang Z Pei 2023 Fault diagnosis method of rolling bearings based on adaptive modified CEEMD and 1DCNN model ISA Trans. 140 309 330 [DOI] [PubMed] [Google Scholar]
  • 39.H Wang Z Tan A Zhang L Pu J Zhang Z Zhang 2023 Carbon market price prediction based on sequence decomposition-reconstruction-dimensionality reduction and improved deep learning model J. Clean. Prod. 425 139063 [Google Scholar]
  • 40.B Lin C Zhang 2021 A novel hybrid machine learning model for short-term wind speed prediction in inner Mongolia, China Renew. Energy 179 1565 1577 [Google Scholar]
  • 41.Yuan J, Luo L, Jiang H, Zhao Q, Zhou B. An intelligent index-driven multiwavelet feature extraction method for mechanical fault diagnosis[J]. Mechanical Systems and Signal Processing, 2023, 188: 109992. [Google Scholar]
  • 42.W Ding S Hou S Tian S Liang D Liu 2023 A Bayesian optimized variational mode decomposition-based denoising method for measurement while drilling signal of down-the-hole drilling IEEE Trans. Instrum. Meas. 72 1 1437323850 [Google Scholar]
  • 43.M Rostaghi H Azami 2016 Dispersion entropy: A measure for time-series analysis IEEE Signal Process. Lett. 23 5 610 614 [Google Scholar]
  • 44.J Zheng J Wang H Pan J Tong Q Liu 2024 Refined Time-shift multiscale slope entropy: A new nonlinear dynamic analysis tool for rotating machinery fault feature extraction Nonlinear Dyn. 10.1007/s11071-024-10106-y [Google Scholar]
  • 45.Y Li B Tang B Geng S Jiao 2022 Fractional order fuzzy dispersion entropy and its application in bearing fault diagnosis Fractal Fract. 6 10 544 [Google Scholar]
  • 46.MHL Louk BA Tama 2023 Dual-IDS: A bagging-based gradient boosting decision tree model for network anomaly intrusion detection system Expert Syst. Appl. 213 119030 [Google Scholar]
  • 47.B Fu Y Liang Z Lao X Sun S Li H He W Sun D Fan 2023 Quantifying scattering characteristics of mangrove species from Optuna-based optimal machine learning classification using multi-scale feature selection and SAR image time series Int. J. Appl. Earth Observ. Geoinform. 122 103446 [Google Scholar]
  • 48.L Qian Z Chen Y Huang RJ Stanford 2023 Employing categorical boosting (CatBoost) and meta-heuristic algorithms for predicting the urban gas consumption Urban Clim. 51 101647 [Google Scholar]
  • 49.S Liu J Chen S He Z Shi Z Zhou 2022 Subspace Network with Shared Representation learning for intelligent fault diagnosis of machine under speed transient conditions with few samples ISA Trans. 128 531 544 [DOI] [PubMed] [Google Scholar]
  • 50.Z Shi J Chen Y Zi Z Zhou 2021 A novel multitask adversarial network via redundant lifting for multicomponent intelligent fault detection under sharp speed variation IEEE Trans. Instrument. Meas. 70 1 10 [Google Scholar]
  • 51.H Huang N Baddour 2018 Bearing vibration data collected under time-varying rotational speed conditions Data Brief 21 1745 1749 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES