|
Algorithm 4: HSACEC hybrid sampling algorithm |
Input: Dataset D containing class N; equilibrium sampling number m; classifier algorithm h; and the number of fractional samples T. Output: Equilibrium sample set Q
-
1.
Set the oversampling rate for each minority class according to Formulas (3) and (4), use SMOTE algorithm to synthesize new samples for each minority class, and add them to dataset D;
-
2.
Calculate the single rated sampling quantity z;
-
3.
The K-means algorithm is used to cluster the sample sets of each category in dataset D, generate z clusters for each category, and extract the representative points of the clusters from each cluster; a total of N*z samples were extracted and added to the balanced data set Q to realize the initialization of the data set Q, and the sampling times were counted. Then, remove the extracted samples from dataset D,;
-
4.
for /*For each sample, repeat steps 5–15.*/
-
5.
Training classifier h using balanced data set Q;
-
6.
For every majority class in dataset D is class i
-
7.
The K-means algorithm is used to cluster the sample set Si of class i in dataset D and generate clusters.
-
8.
The samples in sample set Si are classified by classifier h, and the average classification error rate of the samples in each cluster of Si is calculated by classifier h.;
-
9.
The of all clusters is sorted in descending order, the clusters corresponding to the first with a larger value are screened out, and the representative points of these clusters are extracted and added to the balanced dataset Q, and then the extracted samples are removed from the dataset D, ;
-
10.
end for
-
11.
For every non-majority class in dataset D is class j
-
12.
The sample set S of class j in dataset D is clustered by K-means to generate clusters;
-
13.
The representative points of each cluster are extracted and added to the balanced dataset Q, and the extracted samples are removed from dataset D, ;
-
14.
end for
-
15.
end for
-
16.
return: equilibrium sample set Q
|