. 2022 Mar 31;3(5):100471. doi: 10.1016/j.patter.2022.100471

Table 5.

Proposed methodology for selecting features using the greedy feature-selection algorithms and a voting strategy

Input:

X ε R^{N \times M}

and

y ε R^{N \times 1}

, where

N

is the number of data samples and

M

is the number of features. Depending on the FS algorithm, you may need additional hyper-parameters and/or pre-processing of the data (e.g., discretization).
Process

1.
For each FS algorithm, create an empty set $S$ , which will contain the indices of the selected features.
2.
Randomly select 90% of the data samples from the data matrix $X$ along with their responses, $y$ .
3.
Run the FS algorithm using the 90% randomly selected samples. The result is an ordered sequence of features (often you can choose the number of features $m \leq M$ as the output to save on computational time), where the first feature is considered the most important for the chosen FS algorithm.
4.
Repeat steps 2 and 3 multiple times, say $R_{p}$ , and store the results in a matrix $X_{F S}$ (of size $R_{p} \times m$ ), In each of the $1 \dots R_{p}$ rows of $X_{F S}$ , we store the selected feature subset.
5.
Voting to decide on the final feature subset for each FS algorithm: feature indices are incrementally included, one at a time, in $S$ . For each of the $1 \dots m$ steps, we find the indices corresponding to the features selected until that step for all the repetitions in step 4 (i.e., use the $R_{p} \times L$ subset of $X_{F S}$ , where $L$ corresponds to the features selected in the first $L$ FS steps [in the last step $L = m$ ]).
6.
We select the feature index that appears most frequently among these $R_{p} \times L$ elements and that is also not already included in $S$ . This index is now included as the $L$ ^th element in $S$ . Ties are resolved by including the lowest index number.
7.
Repeat steps 5 and 6 for the number of features we want to ultimately use (i.e., $m$ ).

Output:

φ_{s} ε R^{1 \times m}

vector with the ordered sequence of selected features in descending order of importance. The indices in this sequence correspond to the columns in the data matrix

X