Under-sampling |
●Prototype generation (PG) |
Generate new samples based on original samples to achieve sample balance. Use K-means to cluster majority class samples and then use cluster centroids as newly generated replacement samples [31]. |
●Random under-sampling (RUS) |
Some samples are randomly removed from the majority class, so that the samples of each class are balanced. |
●Edited nearest neighbor (ENN) |
Apply the nearest-neighbors algorithm to edit the dataset to remove samples with an insufficient neighborhood [32]. |
●All-KNN (ALLKNN) |
Apply ENN several times and vary the number of nearest neighbors [32]. |
Oversampling |
●Naive random over-sampling (ROS) |
Using the method of extraction with replacement, random sampling from minority class samples to replace the existing sample set; can increase the weight of minority class samples. |
●Synthetic minority oversampling technique (SMOTE) |
For each minority class sample, the nearest k minority class samples are identified, a sample point is randomly selected each time, the corresponding adjacent sample point is randomly selected, and a new sample point is obtained by interpolating the sample point and adjacent sample point, thereby increasing the minority class samples to balance the data [33]. |
●Borderline-SMOTE |
This is an improved algorithm of SMOTE. Divide the minority class sample points into “noise points”, “dangerous points”, and “safe points”, and only use the dangerous points when calculating the nearest k minority class samples [34]. |
●SMOTENC |
This is an improved algorithm of SMOTE. Categorical variables are not properly distanced and interpolated. SMOTENC uses the value difference metric (VDM) algorithm to calculate the distance of categorical variables, which enables the processing of categorical variables [33]. |
●ADASYN |
Similar to SMOTE, it is also based on k adjacent and interpolation algorithms, the difference being that ADASYN considers other types of samples when calculating k adjacent samples [35]. |
Mixed sampling |
●SMOTEENN |
The SMOTE method is used to generate new minority class samples. There may be some noisy samples in the new samples. Apply the ENN method to remove noisy samples and obtain cleaner data [36]. |
●SMOT-Tomek Links |
Similar to SMOTEENN, Tomek Links are applied to remove noisy samples to obtain cleaner data [37]. |