Table 1.
The best performing gene selection procedure with the .632+ bootstrap error identified by FiGS for each of the six microarray datasets.
Dataset | Feature selection method | k | Gene expression pattern | Feature discretization | Feature vector addition | Classifier | Error |
---|---|---|---|---|---|---|---|
Leukemia | Wilcoxon rank sum test | 10 | Down-regulated | Not apply | Not apply | SVM | 0.02 |
Leukemia | Wilcoxon rank sum test | 10 | Down-regulated | Not apply | Not apply | RF | 0.02 |
Leukemia | Wilcoxon rank sum test | 10 | Down-regulated | Not apply | Apply | SVM | 0.02 |
Leukemia | Wilcoxon rank sum test | 10 | Down-regulated | Not apply | Apply | RF | 0.02 |
Leukemia | Wilcoxon rank sum test | 10 | Down-regulated | Apply | Not apply | SVM | 0.02 |
Leukemia | Information gain method | 10 | Down-regulated | Not apply | Not apply | SVM | 0.02 |
Leukemia | Information gain method | 10 | Down-regulated | Not apply | Not apply | RF | 0.02 |
Leukemia | Information gain method | 10 | Down-regulated | Not apply | Apply | SVM | 0.02 |
Leukemia | Information gain method | 10 | Down-regulated | Not apply | Apply | RF | 0.02 |
Leukemia | Information gain method | 10 | Down-regulated | Apply | Not apply | SVM | 0.02 |
Leukemia | Information gain method | 10 | Down-regulated | Apply | Not apply | RF | 0.02 |
Colon | Information gain method | 30 | Up-regulated | Not apply | Not apply | RF | 0.11 |
Prostate | Information gain method | 25 | Total | Not apply | Not apply | RF | 0.05 |
Adenocarcinoma | Wilcoxon rank sum test | 10 | Up-regulated | Not apply | Not apply | RF | 0.10 |
Breast | Wilcoxon rank sum test | 15 | Down-regulated | Not apply | Apply | SVM | 0.31 |
Breast | Information gain method | 15 | Down-regulated | Not apply | Apply | SVM | 0.31 |
DLBCL | Wilcoxon rank sum test | 20 | Total | Not apply | Not apply | RF | 0.08 |
k is the number of selected genes; and error is the .632+ bootstrap error achieved by the best performing gene selection procedure tested on 100 bootstrap samples. In the case of the leukemia and breast datasets, the multiple gene selection procedures are the best.