Require: for each training instance a vector of feature values and the class value |
1: n ← number of training instances |
2: a ← number of attributes (i.e. features) |
3: Parameter:
k ← number of nearest hits ‘H’ and misses ‘M’ |
4: |
5: # STAGE 1
|
6: pre-process dataset {≈ a · n time complexity} |
7: # STAGE 2
|
8: pre-compute distance array {≈ 0.5 · a · n2 time complexity} |
9: # STAGE 3
|
10: initialize all feature weights W[A] := 0.0 |
11: for
i:=1 to
n
do
|
12: # IDENTIFY NEIGHBORS
|
13: for j:=1 to
n
do
|
14: identify k nearest hits and k nearest misses (using distance array) |
15: end for
|
16: # FEATURE WEIGHT UPDATE
|
17: for all hits and misses do
|
18: for A:= to
a
do
|
19: W[A] := W[A] − diff (A,Ri,H)/(n · k) + diff (A,Ri,M)/(n · k) |
20: end for
|
21: end for
|
22: end for
|
23: return the vector W of feature scores that estimate the quality of features |