Skip to main content
. 2020 Jul 30;37(2):202–212. doi: 10.1093/bioinformatics/btaa690

Table 3.

AUROCs for various methods for selecting features

Algorithm Parameters
Distributions
K α Globally separable Pairwise separable
Sparse PCA (Zou et al., 2006) Ktrue/2 1 0.38 ± 0.21 0.97 ± 0.03
5.75 0.59 ± 0.21 0.93±0
10.5 0.48±1016 0.93±0
15.25 0.5±0 0.65 ± 0.08
20 0.5±0 0.59 ± 0.01
K true 1 0.47 ± 0.29 1.0±0
5.75 0.56 ± 0.21 >0.99±104
10.5 0.47±0 1.0±0
15.25 0.48 ± 0.01 0.78 ± 0.07
20 0.5±0 0.63 ± 0.02
2Ktrue 1 0.52 ± 0.32 1.0±0
5.75 0.43±0 >0.99±103
10.5 0.43±0 1.0±0
15.25 0.47 ± 0.01 0.80 ± 0.7
20 0.5±0 0.64 ± 0.01
Sparse K-means (Witten and Tibshirani, 2010) Ktrue/2 1 0.80 ± 0.40 1.0±1016
5.75 0.84 ± 0.32 1.0±0
10.5 0.87 ± 0.18 1.0±0
15.25 0.86 ± 0.23 1.0±0
20 0.80 ± 0.40 1.0±0
K true 1 0.96 ± 0.08 1.0±0
5.75 0.88 ± 0.24 1.0±1016
10.5 0.80 ± 0.40 1.0±0
15.25 0.94 ± 0.12 1.0±0
20 1.0±1016 1.0±0
2Ktrue 1 0.91 ± 0.18 1.0±0
5.75 0.85 ± 0.30 1.0±0
10.5 1.0±~ 1016 1.0±0
15.25 0.84 ± 0.32 1.0±1016
20 0.83 ± 0.34 1.0±0
Sparse hierarchical clustering (Witten and Tibshirani, 2010) N/A 1 0 ± 0 0.54 ± 0.03
5.75 0±0 0.57 ± 0.04
10.5 0±0 0.59 ± 0.02
15.25 0±0 0.56 ± 0.03
20 0±0 0.59 ± 0.02
LFSBSS (Li et al., 2008) Ktrue/2 N/A 0.5±0 0.5±0
K true 0.5±0 0.5±0
2Ktrue 0.5±0 0.5±0
Spectral selection (Zhao and Liu, 2007) N/A N/A 0.5±0 0.5±0
SMD (hierarchical proposal clusters) Unif(2,Ktrue) N/A 1.0±1015 1.0±0
Unif(2,2Ktrue) 1.0 ± 0.02 1.0±1016
Unif(2,4Ktrue) 1.0±1016 1.0±1016
SMD (K-means proposal clusters) Unif(2,Ktrue) N/A 0.91±0.14 1.0±0
Unif(2,2Ktrue) 0.94 ± 0.05 1.0±0
Unif(2,4Ktrue) 1.0±1016 1.0±0

Note: Here, we generate two classes of distributions: globally separable, where one dimension separates two clusters, and other dimensions are uninformative, and pairwise separable, where each dimension separates only a pair of clusters, and the rest are uninformative. In both cases, the ratio of informative to uninformative dimensions is D/Ds=30. For each class of distributions, we generated 5 instances of the class, and used the algorithm in the left column to infer weights for each dimension. Some of the algorithms have input parameters, which are given in columns K (the number of clusters, or in the case of Sparse PCA, the number of components) and α (a sparsity parameter). From these weights, we calculated the AUROC score, and report the average, and standard deviation over the five trials.