Enhancing the Discrimination Ability of a Gas Sensor Array Based on a Novel Feature Selection and Fusion Framework

. 2018 Jun 12;18(6):1909. doi: 10.3390/s18061909

Algorithm 1. Feature Selection

Input:
Original feature matrix

X_{M}

with M-dimensional features.
Output:
Selected feature subset

S

with D-dimensional features

(D = 1, 2, \dots, M)

.
Procedure:
1: D = 1. Compute

S I (f_{i}) (i = 1, 2, \dots, M)

of each dimension of the original feature matrix and record the score₁:

s c o r e_{1} = S I (f_{i})

. Choose the feature with the largest

s c o r e_{1}

as the first element of the optimal feature subset S. Then, the remaining feature element is

X_{M - D}

.
2: do
Step 1: D = D + 1. Then, choose a feature element from

X_{M - D}

in turn, and combine the element with S into a new feature subset

X_{T}

, all subset

X_{T}

make up a new feature matrix

X^{'}

. Compute the class separability index (

S I

) of each feature subset in the

X^{'}

and the

S I

is defined as

S I = \frac{2}{D} \sum_{i = 1}^{D} S I (f_{i})

.
Step 2: For the formed new feature matrix

X^{'}

in Step 1, obtain

t = (\begin{array}{l} 2 \\ D \end{array})

subsets. Then, compute the average

D I

of the pairwise dissimilarity of all the subsets.
Step 3: For each subset, compute the score₂ defined as

s c o r e_{2} = S I + D I

, which reflects whether the feature subset is appropriate.
Step 4: Put the feature element with the largest value of

s c o r e_{2}

into

S

and reset the remaining feature element

X_{M - D}

.
Step 5: Input the selected feature subset

S

with D-dimensional features into the classifier. Then, the classification accuracy of the D-dimensional features

a c c u r a c y (D)

will be obtained.
End while until the number of selected elements D reaches M.
3: Choose the best classification accuracy from

a c c u r a c y (D) (D = 1, 2, 3 \dots M)

as the final accuracy for this kind of feature after feature selection. If

a c c u r a c y (i) = a c c u r a c y (j)

but

i \leq j (i, j = 1, 2, 3 \dots M)

i

can be considered as the optimal feature dimension.
Return:

S

= {s¹, s², …, s^M}.
Note: The larger score₂ means the feature is more beneficial to increasing classification performance.