Skip to main content
. 2023 Aug 9;25(8):1185. doi: 10.3390/e25081185
Algorithm 1 EOEH
Input: dataset Y with dimension n and sample number m, number of subsample sets T, number of samples in each subsample set μ, abnormal entropy weight α, normal entropy weight β, neighborhood parameter K
Output: Integrate exception score set O
  • 1:

    Begin

  • 2:

    for St1 to T do

  • 3:

        St = Random(Y,μ)   // The subsample set St is formed from the dataset Y without putting back μ of random sampling

  • 4:

    End for

  • 5:

    qY, pSt, a = Feature set a = a1,a2,,an of dataset Y

  • 6:

    for St1 to T do

  • 7:

        for ai1 to n do

  • 8:

            SFE(St,ai)(q)   // Calculate the subsample eigenentropy of point q on the subsample set St with respect to feature ai

  • 9:

            if the SFE(St,ai)(q)> the average of SFE(St,ai)(p) for other points in the sub-sample set then

  • 10:

                AFS(St)(q) = {aiaai is an abnormal feature of q} // Point q belongs to the abnormal feature subspace in St, which is formed by all the abnormal features of q in St.

  • 11:

            End if

  • 12:

        End for

  • 13:

    End for

  • 14:

    The feature weight vector of point q in the sub-sample set St is FWV(q)={ω1,ω2,...,ωn}.

  • 15:

    for ωi1 to n do

  • 16:

        if aiAFS(St)(q) then   //The feature weight vector of point q is assigned differently for each feature based on the abnormal feature subspace.

  • 17:

            ωi=α

  • 18:

        else

  • 19:

            ωi=β

  • 20:

        End if

  • 21:

    End for

  • 22:

    for St1 to T do

  • 23:

        for each pSt do

  • 24:

            SWdistance(St)(q,p)   // Calculate the weighted distance based on the subspace between point q and each point in the sub-sample set.

  • 25:

        End for

  • 26:

        Perform the KNN (k-nearest neighbors) algorithm on sub-sample set St using the SWdistance(St)(q,p) (weighted distance) metric between point q and point p.   // Obtain the k-neighborhood of point q on the sub-sample set based on the weighted distance.

  • 27:

        for j1 to K do

  • 28:

            two reach-dist(St,k)(q,p)   // Calculate the two-reach distance of point q within its k-neighborhood

  • 29:

        End for

  • 30:

        Based on the definition of detailed local reachability density, it can calculate the dlrd(St,k)(q) value of point q on the subset St.

  • 31:

        By averaging the dlrd(St,k)(q) value of point q and the dlrd(St,k)(q) values of other points in its k-neighborhood, it can obtain the detailed local outlier factor dLOF(St,k)(q) that reflects the abnormality of point q on the subset St.

  • 32:

    End for

  • 33:

    By utilizing the ensemble anomaly score Oq=TdLOF(St,k)(q)/T based on the subset, calculating the ensemble anomaly scores for each data object in the dataset Y. The ensemble anomaly score set O=O1,O2,...,Om is obtained, where Oi represents the ensemble anomaly score for the ith data object in the dataset Y.

  • 34:

    End