Table 2: Classification accuracies of feature selection methods using the tree-based learner.

Here, we show the classification accuracies of the various feature selection methods on six publicly available datasets. Here Fisher refers to the Fisher score, PFA refers to principal feature analysis, and All-Feature refers to the learner that uses all input features. For each method, we select k = 50 features. The classifier used here was an Extremely Randomized Tree classifier (a variant of random forests) with the number of trees being 50. All reported values are on a hold-out test set. (Higher is better.)

Dataset	(n, d)	# Classes	All-Feature	Fisher	HSIC-Lasso	PFA	LassoNet

Mice Protein	(1080, 77)	8	0.997	0.996	0.996	0.997	0.997
MNIST	(10000, 784)	10	0.941	0.818	0.869	0.879	0.892
MNIST-Fashion	(10000, 784)	10	0.831	0.66	0.775	0.784	0.794
ISOLET	(7797, 617)	26	0.951	0.818	0.888	0.855	0.891
COIL-20	(1440, 400)	20	0.996	0.996	0.993	0.993	0.993
Activity	(5744, 561)	6	0.859	0.794	0.845	0.808	0.860