Input: Input data X : x1, x1...xm and labels Y : y1, y1...ym, where m is the number of samples. x is n-dimensional gene vector.s is the step size of RFE. |
Output: Ranked genes GR of all the genes. |
1: for
k = 1:10 do
|
2: The data set was randomly divided into ten equal parts; |
3: Keep one part as a test data; The remaining nine parts are used as training data; |
4: while
X is not empty do
|
5: Train a model based on training data of X using SVM; |
6: Calculate the prediction accuracy of the model using the test data; |
7: Obtain the weight of each gene produced from SVM; |
8: Remove s least weighted genes and update X; |
9: end while
|
10: Obtain the gene subset G1 with the highest prediction accuracy; |
11: while
X is not empty do
|
12: Train a model based on training data of X using RF; |
13: Calculate the prediction accuracy of the model using the test data; |
14: Obtain the importance of each gene produced from RF; |
15: Remove s least weighted genes and update X; |
16: end while
|
17: Obtain the gene subset G2 with the highest prediction accuracy; |
18: Count the votes for all the genes contained in both G1 and G2; |
19: end for
|
20: Rank genes based on votes and obtain GR. |