|
Algorithm 1 EOEH |
Input: dataset Y with dimension n and sample number m, number of subsample sets T, number of samples in each subsample set , abnormal entropy weight , normal entropy weight , neighborhood parameter K Output: Integrate exception score set O
-
1:
Begin
-
2:
for to T do
-
3:
= Random() // The subsample set is formed from the dataset Y without putting back of random sampling
-
4:
End for
-
5:
, , a = Feature set a = of dataset Y
-
6:
for to T do
-
7:
for to n do
-
8:
// Calculate the subsample eigenentropy of point q on the subsample set with respect to feature
-
9:
if the the average of for other points in the sub-sample set then
-
10:
= is an abnormal feature of // Point q belongs to the abnormal feature subspace in , which is formed by all the abnormal features of q in .
-
11:
End if
-
12:
End for
-
13:
End for
-
14:
The feature weight vector of point q in the sub-sample set is .
-
15:
for to n do
-
16:
if then //The feature weight vector of point q is assigned differently for each feature based on the abnormal feature subspace.
-
17:
-
18:
else
-
19:
-
20:
End if
-
21:
End for
-
22:
for to T do
-
23:
for each do
-
24:
// Calculate the weighted distance based on the subspace between point q and each point in the sub-sample set.
-
25:
End for
-
26:
Perform the KNN (k-nearest neighbors) algorithm on sub-sample set using the (weighted distance) metric between point q and point p. // Obtain the k-neighborhood of point q on the sub-sample set based on the weighted distance.
-
27:
for to K do
-
28:
two reach- // Calculate the two-reach distance of point q within its k-neighborhood
-
29:
End for
-
30:
Based on the definition of detailed local reachability density, it can calculate the value of point q on the subset .
-
31:
By averaging the value of point q and the values of other points in its k-neighborhood, it can obtain the detailed local outlier factor that reflects the abnormality of point q on the subset .
-
32:
End for
-
33:
By utilizing the ensemble anomaly score based on the subset, calculating the ensemble anomaly scores for each data object in the dataset Y. The ensemble anomaly score set is obtained, where represents the ensemble anomaly score for the data object in the dataset Y.
-
34:
End
|