A Framework for Feature Selection to Exploit Feature Group Structures

Kushani Perera; Jeffrey Chan; Shanika Karunasekera

doi:10.1007/978-3-030-47426-3_61

. 2020 Apr 17;12084:792–804. doi: 10.1007/978-3-030-47426-3_61

A Framework for Feature Selection to Exploit Feature Group Structures

Kushani Perera ^14,^✉, Jeffrey Chan ¹⁵, Shanika Karunasekera ¹⁴

Editors: Hady W Lauw⁸, Raymond Chi-Wing Wong⁹, Alexandros Ntoulas¹⁰, Ee-Peng Lim¹¹, See-Kiong Ng¹², Sinno Jialin Pan¹³

PMCID: PMC7206161

Abstract

Filter feature selection methods play an important role in machine learning tasks when low computational costs, classifier independence or simplicity is important. Existing filter methods predominantly focus only on the input data and do not take advantage of the external sources of correlations within feature groups to improve the classification accuracy. We propose a framework which facilitates supervised filter feature selection methods to exploit feature group information from external sources of knowledge and use this framework to incorporate feature group information into minimum Redundancy Maximum Relevance (mRMR) algorithm, resulting in GroupMRMR algorithm. We show that GroupMRMR achieves high accuracy gains over mRMR (up to Inline graphic 35%) and other popular filter methods (up to 50%). GroupMRMR has same computational complexity as that of mRMR, therefore, does not incur additional computational costs. Proposed method has many real world applications, particularly the ones that use genomic, text and image data whose features demonstrate strong group structures.

Keywords: Filter feature selection, Feature groups, Squared Inline graphic norm minimisation

Introduction

Feature selection is proven to be an effective method in preparing high dimensional data for machine learning tasks such as classification. The benefits of feature selection include increasing the prediction accuracy, reducing the computational costs and producing more comprehensible data and models. Among the three main feature selection methods, filter methods are preferred to wrapper and embedded methods in applications where the computational efficiency, classifier independence, simplicity, ease of use and the stability of the results are required. Therefore, filter feature selection remains an interesting topic in many recent research areas such as biomarker identification for cancer prediction and drugs discovery, text classification and predicting defective software [3–5, 10, 11, 16, 18] and has growing interest in big data applications [19]; according to the Google Scholar search results, the number of research papers published related to filter methods in year 2018 is Inline graphic 1,800 of which 170 are in gene selection area.

Most of the existing filter methods perform feature selection based on the instance-feature data alone [7]. However, in real world datasets, there are external sources of correlations within feature groups which can improve the usefulness of feature selection. For example, the genes in genomic data can be grouped based on the Gene Ontology terms they are annotated with [2] to improve bio-marker identification for the tasks such as disease prediction and drugs discovery. The words in documents can be grouped according to their semantics to select more significant words which are useful in document analysis [14]. The nearby pixels in images can be grouped together based on their spatial locality to improve selection of pixels for image classification. In software data, software metrics can be grouped according to their granularity in the code to improve the prediction of defective software [11, 18]. In Sect. 4, using a text dataset as a concrete example, we demonstrate the importance of feature group information for filter feature selection to achieve good classification accuracy.

Although feature group information have been used to improve feature selection in wrapper and embedded approaches [8, 12], group information is only rarely used to improve the feature selection accuracy in filter methods. Yu et al. [19] proposes a group based filter method, GroupSAOLA (GSAOLA), yet being an online method, it achieves poor accuracy, which we show experimentally. The common method used by embedded methods to exploit feature group information is minimising the Inline graphic and norms of the feature weight matrix, while minimising the classification error. Depending on whether the features are encouraged from the same group [8] or different groups [12], norm is used to cause inter group or intra group sparsity. Selecting features from different groups is shown to be more effective than selecting features from the same group [12].

Motivated by these approaches, we show that squared Inline graphic norm minimization of the feature weight matrix can be used to encourage features from different feature groups in filter feature selection. We propose a generic framework which combines existing filter feature ranking methods with feature weight matrix norm minimisation and use this framework to incorporate feature group information in to mRMR objective [7] because mRMR algorithm achieves high accuracy and efficiency at the same time, compared to other filter methods [3, 4]. However, the proposed framework can be used to improve any other filter method, such as information gain based methods. As Inline graphic norm minimization is an NP-hard problem, we propose a greedy feature selection algorithm, GroupMRMR, to achieve the feature selection objective, which has the same computational complexity as the mRMR algorithm. We experimentally show that for the datasets with feature group structures, GroupMRMR obtains significantly higher classification accuracy than the existing filter methods. Our main contributions are as follows.

We propose a framework which supports the filter feature selection methods to utilise feature group information to improve their classification accuracy.
Using the proposed framework, we integrate feature group information into mRMR algorithm and propose a novel feature selection algorithm.
Through extensive experiments we show that our algorithm obtains significantly higher classification accuracy than the mRMR and existing filter feature selection algorithms for no additional computational costs.

Related Work

Utilization of feature group information to improve prediction accuracy has been popular in embedded feature selection [8, 12, 17]. Among them, algorithms such as GroupLasso [8] encourage features from the same group while algorithms such as Uncorrelated GroupLasso [12] encourage features from different groups. We select the second approach as it is proven to be more effective for real data [12]. Filter feature selection is preferred over wrapper and embedded methods due to their classifier independence, computational efficiency and simplicity, yet have comparatively low prediction accuracy. However, most filter methods select the features based on the instance-feature data alone, which are coded in the data matrix, using information theoretic measures [7, 13, 15]. Some methods [20] use the feature group concept, yet the groups are also formed using instance-feature data to reduce feature redundancy. None of these methods take advantage of the external sources of knowledge about feature group structures. GSAOLA [19] is an online filter method which exploits feature groups, however we experimentally show that our method significantly outperforms it in terms of accuracy.

Preliminaries

In this section and Table 1, we introduce the terms used later in the paper. Let C be the class variable of a dataset, D, and Inline graphic , any two feature variables.

Table 1.

Frequently used definitions

F	Set of all features	I	Set of all feature group indices
S	Selected feature subset, S F		Set of features in feature group
G	Set of all feature groups		The weight of the feature group

Open in a new tab

Definition 1

Given that X and Y are two feature variables in D, with feature values x and y respectively, mutual information between X and Y, is given by Inline graphic .

Definition 2

The relevancy of Inline graphic = .

Definition 3

The redundancy between Inline graphic and = .

Given that W Inline graphic , is the row of W, is the element in , the squared norm of W is defined as = = where = = # (). For the scenarios in which the rows of W have different importance levels, we define = = . is the weight of . k is the required number of features.

Motivation and Background

Ignoring the external sources of correlations within feature groups may result in poor classification accuracy for the datasets whose features show a group behaviour. We demonstrate this using mRMR algorithm as a concrete example, a filter method which otherwise achieves good accuracy.

mRMR Algorithm: mRMR objective for selecting a feature subset S Inline graphic F of size k is as follows.

To achieve the above objective, mRMR selects one feature at a time to maximise the relevancy of the new feature x with the class variable and to minimise its redundancy with the already selected feature set, as shown in Eq. (2).

Example 1: Consider selecting two features from the dataset in Fig. 1. In this dataset, each document is classified into one of the four types: Botany, Zoology, Physics or Agriculture. The rows represent the feature vector, the words which have occurred in the documents. 1 means the word has occurred within the document (or has occurred with high frequency) and 0 means otherwise.

Inline graphic — Example text document dataset. Column (): a document/instance, Row: a word/feature, Class: document type, 1/0: Occurrence of a word, B: Botany, Z: Zoology, P: Physics, A: Agriculture

The relevancies of the features, Apple, Rice, Cow and Sheep are 0.549, 0.443, 0.311 and 0.311, respectively. mRMR first selects Apple, which has the highest relevancy. The redundancies of Rice, Cow and Sheep with respect to Apple are 0.07, 0.017 and 0.016, respectively. Therefore, mRMR next selects Rice, the feature with the highest relevancy redundancy difference, 0.373 (0.443 - 0.07). Global mRMR optimisation approaches [15] also select {Apple, Rice}.

Exploiting Feature Group Semantics: Figure 2 shows the value pattern distribution of {Apple, Sheep} and {Apple, Rice} pairs within each class. In {Apple, Sheep}, the highest probability value pattern in each class is different from one another. Therefore, each value pattern is associated with a different class, which helps distinguishing all the document types from one another. In {Apple, Rice}, there is no such distinctive relationship between the value patterns and classes. Using the value pattern distribution, the classification algorithm cannot distinguish between the Zoology and Physics documents and between Agriculture and Botany documents. This shows that features from different groups have achieved better class discrimination.

Fig. 2. — Value pattern probabilities created by different feature subsets in each class, A: Agriculture, B: Botany, P: Physics, Z: Zoology, Class: The class assigned to the value pattern, %: ; x, y {0,1}, a: Apple, r: Rice, s: Sheep

The reason behind the suboptimal result of the mRMR algorithm is its ignorance about the high level feature group structures. The words Apple and Rice form a group as they are plant names. Cow and Sheep form another group as they are animal names. The documents are classified according to whether they contain plant names or/and animal names, regardless of the exact plant or animal name they contain. Botany documents ( Inline graphic –) contain plant names (Apple or Rice) and no animal names. Zoology documents (–) contain animal names (Cow or Sheep) and no plant names. This high level insight is not captured by the instance-feature data alone. Using feature group information as an external source of knowledge and encouraging features from different feature groups help solving this problem.

Proposed Method: GroupMRMR

We propose a framework which facilitates filter feature selection methods to exploit feature group information to achieve better classification accuracy. Using this framework, we extend mRMR algorithm into GroupMRMR algorithm, which encourages features from different groups to bring in different semantics which help selecting a more balanced set of features. We select mRMR algorithm for extension because it has proven good classification accuracy with low computation costs, compared to other filter feature selection methods. The feature groups are assigned weights ( Inline graphic ) to represent their importance levels, and GroupMRMR selects more features from the groups with higher importance. Group weights may be decided according to factors such as group size and group quality. For this paper, we assume that the feature groups do not overlap but plan to investigate overlapping groups in the future.

Feature Selection Objective

Our feature selection objective includes both the filter feature selection objective and encouraging features from different feature groups. To encourage features from different groups, we minimise Inline graphic of the feature weight matrix, W. Using norm at intra group level enforces intra group sparsity, discouraging features to be selected from the same group. Using norm at inter group level encourages features from different feature groups [12].

Let W Inline graphic be a feature weight matrix such that = 1 if S and . Otherwise, = 0. Given that g(W) is any maximisation quantity used in an existing filter feature selection objective which can be expressed a function of W and is a user defined parameter, our objective is to select S F to maximise the following subject to |S| = k, k Inline graphic :

Given that R1 Inline graphic is a diagonal matrix in which = and R2 such that = for i j = 0 for i = j, it can be shown that - = - , where is the transpose of W. That is, the maximisation quantity in mRMR objective in Eq. (1) is a function of W. Consequently, g(W) in Eq. (3) can be replaced with the mRMR objective as shown in Eq. (4).

Definition 4

Given that S and Inline graphic are as defined in Table 1, = = No. of features in S and .

Given Inline graphic is as defined in Definition 4, according to Sect. 3, = . When the feature groups have different weights, the rows of W also have different importance levels. In such scenarios, = , where = where > 0. Consequently, we can rewrite the objective in Eq. (4) as in Eq. (5) subject to |S| = k, k Inline graphic . As the feature groups do not overlap, = |S|. Using Eq. (5), we present Theorem 1 that shows minimising is equivalent to encouraging features from different groups in to S.

Theorem 1

Given Inline graphic = |S|= k, minimum is obtained when = , i, j I, where k is a constant.

Proof

Using Lagrange multipliers method, we show minimum Inline graphic is achieved when = = = . Please refer to this link1 for the detailed proof.

Iterative Feature Selection

As Inline graphic minimisation is NP-hard, we propose a heuristic algorithm to achieve the objective in Eq. (4). The algorithm selects a feature, , at each iteration t to maximise the difference between and , where and are the feature subsets selected after Iteration t and respectively and h(.) is as defined in Eq. (5). As there are datasets with millions of features we seek an algorithm to select Inline graphic with linear complexity. Theorem 2 shows that - can be maximised by adding the term, to the mRMR algorithm in Eq. (2). p is the feature group of the evaluated feature (), is the number of features already selected from p before Iteration t and is the weight of p.

Theorem 2

Given that Inline graphic , , , , p, , as defined above and is the unselected feature subset after Iteration , - = - - .

Proof

To prove this, we use the fact that Inline graphic and are constants at a given iteration. Please refer to this link (see footnote 1) for the detailed proof.

Based on Theorem 2, we propose GroupMRMR algorithm. At each iteration, the feature score of each feature in U is computed as shown in Line 5 of Algorithm 1. The feature with the highest score is removed from U and added to S (Line 7–10 in Algorithm 1). The algorithm can be modified to encourage the features from the same group as well by setting Inline graphic < 0.

Example 1 Revisited: Next, we apply GroupMRMR for Example 1. We assume Inline graphic = 1 and = = 1, i, j I. GroupMRMR first selects Apple, the feature with highest relevancy (0.549). In Iteration 2, value for Rice, Cow, and Sheep are 1, 0 and 0, respectively and are 3, 0 and 0, respectively. The redundancies of each feature with Apple are same as computed in Sect. 4. The feature scores for Rice, Cow and Sheep are −2.627 (0.443-0.07-3), 0.294 (0.311-0.017-0) and 0.295 (0.311-0.016-0), respectively and GroupMRMR selects Sheep, the feature with the highest feature score. Therefore, GroupMRMR selects {Apple, Sheep}, the optimal feature subset, as discussed in Sect. 4.

Computation Complexity: The computational complexity of GroupMRMR is the same as that of mRMR, which is O(|S||F|). |S| and |F| are the cardinalities of the selected feature subset and the complete feature set, respectively. As |S| Inline graphic |F|, GroupMRMR is effectively linear with |F|.

Experiments

This section discusses the experimental results for GroupMRMR for real datasets.

Datasets: We evaluate GroupMRMR, using real datasets, which are benchmark datasets used to test group based feature selection. Table 2 shows a summary of them. Images in Yale have a 32 Inline graphic 32 pixel map. GRV is a JIRA software defect dataset whose features are code quality metrics.

Table 2.

Dataset description. m: # features, n: # instances, c: # classes

Dataset	m	n	c	Type	Dataset	m	n	c	Type
Multi-Tissue (MT) [1]	1,000	103	4	Genomic	CNS [1]	989	42	5	Genomic
Leukemia (LK) [1]	999	38	3	Genomic	Yale [6]	1,024	165	15	Image
Multi-A [1]	5,565	103	4	Genomic	BBC [9]	9,635	2,225	5	Text
Groovy (GRV) [18]	65	757	2	Software

Open in a new tab

Grouping Features: The pixel map of the images are partitioned into m Inline graphic m non overlapping squares such that each square is a feature group. This introduces spatial locality information, not available from just the data (instance-feature) itself. The genes in genomic data are clustered based on the Gene Ontology term annotations as described in [2]. The number of groups is set to 0.04 of the original feature set, based on the previous findings for MT dataset [2]. Words in BBC dataset are clustered using k-means algorithm, based on the semantics available from Word2Vec [14]. We use only 2,411 features, only the words available in the Brown’s corpus. Number of word groups is 50, which is selected by cross validation results on the training data. The code metrics in software defect data are grouped into five groups based on their granularity in the code [18].

Baselines: We compare GroupMRMR with existing filter methods which have proven high accuracy. mRMR algorithm, of which the GroupMRMR is an extension, is a greedy approach to achieve mRMR objective while SPECCMI [15] is a global optimisation algorithm to achieve the same. Conditional Mutual Information (CMIM) [15] is a mutual information based filter method not belonging to the mRMR family. ReliefF [13] is a distance based filter method. GSAOLA [19] is an online filter method which utilises feature group information.

Evaluation Method: The classifier’s prediction accuracy on the test dataset with selected features is considered as the prediction accuracy of the feature selection algorithm. It is measured in terms of the Macro-F1, the average of the F1-scores for each class (AVGF). Average accuracy is the average of AVGFs for all the selected feature numbers up to the point algorithm accuracies converge. The log value of the average run time (measured in seconds) is reported.

Experimental Setup: We split each dataset, 60% instances for training set and 40% for test set, using stratified random sampling method. Feature selection is performed on the training set and the classifier is trained on the training set with the selected features. The classifier is then used to predict the labels of the test set. Due to the small sample size of the datasets we do not use a separate validation set for tuning Inline graphic . Instead, we select [0, 2], which gives the highest classification accuracy on the training set. The classifier used is the Support Vector Machine. For image data, default m = 4. For genomic data, = 1, i. For other datasets, = (,F are defined in Table 1).

Experiment 1: Measures the classification accuracy obtained for the datasets with selected features. Experiment 2: Performs feature selection for image datasets with different feature group sizes: m Inline graphic m (m = 2,4,8). This tests the effect of the group size on the classification accuracy. Experiment 3: Runs GroupMRMR for different [−1, 1]. This tests the effect of on the classification accuracy. Experiment 4: Executes each feature selection algorithm 20 times and compute the average run time to evaluate algorithm efficiency.

Experimental Results: Table 3 shows that GroupMRMR achieves the highest AVGF in all datasets over baselines. In LK dataset, the 100% accuracy is achieved with a lower number of features than baselines. GroupMRMR achieves higher or same average accuracy compared to baselines in 32 out of 35 cases. Figure 3 shows that, despite the slightly low average accuracy compared to ReliefF, GroupMRMR maintains a higher accuracy than baselines in Multi-A for most of the selected feature numbers. Other datasets also show similar results, yet we show only three graphs due to the space limitations. Please refer to this link (see footnote 1) to see all the results graphs. The maximum accuracy gain of GroupMRMR over the accuracy gained by the complete feature set is 2%, 10%, 2%, 2%, 1% and 6% for MT, CNS, Multi-A, Yale, BBC and GRV datasets, respectively. The maximum accuracy gain of GroupMRMR is 50% over SPECCMI in Yale dataset at 50 selected features. The highest accuracy gain of GroupMRMR over mRMR is 35% in CNS dataset at 70 selected features. Figure 4a shows that the classification accuracy of GroupMRMR for 8 Inline graphic 8 image partitions is less than for 4 4 and 2 2 partitions. Figure 4b shows that the classification accuracy is not much sensitive to in the [, 1] range, yet degrades to a large extent when < 0. Figure 4c shows that the runtime of GroupMRMR is almost the same as the run time of mRMR algorithm and lower than most of the other baseline methods ( Inline graphic 10 times lower than SPECCMI and CMIM for BBC dataset).

Table 3.

Comparison of accuracies achieved by different algorithms. Row 1: The maximum accuracy (in AVGF) gained by each algorithm in each dataset. The highest maximum AVGF for each dataset is in bold letters. Row 2 (x): the number of features at which the highest AVGF is achieved. Row 3 (%): The average accuracy gain of GroupMRMR over the baseline. +: GroupMRMR wins, −: GroupMRMR losses

	MT	CNS	LK	Multi-A	Yale	BBC	GRV
GroupMRMR	1	0.9	1	1	0.85	0.95	0.66
GroupMRMR	(110)	(90)	(20)	(90)	(500)	(800)	(10)
MRMR	0.98	0.88	0.94	0.95	0.83	0.93	0.57
	(70)	(180)	(40)	(110)	(450)	(400)	(30)
	+4%	+11%	+4%	+5%	+7%	0%	+4%
GSAOLA	0.95	0.86	1	0.95	0.84	0.93	0.56
	(60)	(50)	(50)	(170)	(600)	(1000)	(25)
	+1%	+2%	+2%	+3%	+17%	+3%	+3%
SPECCMI	0.9	0.71	1	0.95	0.80	0.93	0.61
	(90)	(180)	(190)	(190)	(500)	(1000)	(30)
	+12%	+16%	+17%	+8%	+14%	+7%	−1%
CMIM	0.95	0.83	0.88	0.93	0.8	0.92	0.61
	(200)	(160)	(90)	(80)	(600)	(800)	(25)
	+10%	+19%	+32%	+9%	+13%	+8%	−1%
ReliefF	0.95	0.83	1	1	0.8	0.93	0.52
	(60)	(170)	(80)	(80)	(450)	(1000)	(25)
	+2%	+6%	+3%	−1%	+12%	+2%	+6%

Open in a new tab

Fig. 3. — Classification accuracy variation with the number of selected features

Fig. 4. — Accuracy and runtime variations for Yale and BBC datasets (a) Accuracy variation with the group size (Yale) (b) Accuracy variation with (Yale) (c) Average run time variation (in log scale) of the algorithms (BBC). 95% confidence interval error bars are too small to be visible due to the high precision (standard deviations 2 s)

Evaluation Insights: GroupMRMR consistently shows good classification accuracy compared to baselines for all the datasets (highest average accuracy and highest maximum accuracy in almost all datasets). The equal run times of GroupMRMR and mRMR show that the accuracy gain is obtained for no additional costs and supports the time complexity analysis in Sect. 5. Better prediction accuracy is obtained for small groups because large feature groups resemble the original feature set with no groupings. This shows the importance of feature group information to gain high feature selection accuracy. The accuracy is lower when the features are encouraged from the same group ( Inline graphic < 0) instead from different groups ( > 0), which supports our hypothesis. The classification accuracy is less sensitive to , therefore parameter tuning is less required.

Conclusion

We propose a framework which facilitates filter feature selection methods to exploit feature group information as an external source of information. Using this framework, we incorporate feature group information into mRMR algorithm, resulting in GroupMRMR algorithm. We show that compared to baselines, GroupMRMR achieves high classification accuracy for the datasets with feature group structures. The run time of GroupMRMR is same as the run time of mRMR, which is lower than many existing feature selection algorithms. Our future work include experimenting the proposed framework for other filter methods and detecting whether a dataset contains feature group structures.

Acknowledgements

This work is supported by the Australian Government.

Footnotes

https://sites.google.com/view/kushani/publications.

Contributor Information

Hady W. Lauw, Email: hadywlauw@smu.edu.sg

Raymond Chi-Wing Wong, Email: raywong@cse.ust.hk.

Alexandros Ntoulas, Email: antoulas@di.uoa.gr.

Ee-Peng Lim, Email: eplim@smu.edu.sg.

See-Kiong Ng, Email: seekiong@nus.edu.sg.

Sinno Jialin Pan, Email: sinnopan@ntu.edu.sg.

Kushani Perera, Email: bperera@student.unimelb.edu.au.

Jeffrey Chan, Email: jeffrey.chan@rmit.edu.au.

Shanika Karunasekera, Email: karus@unimelb.edu.au.

References

1.Cancer program datasets. http://portals.broadinstitute.org/cgi-bin/cancer/datasets.cgi. Accessed Nov 2019
2.Acharya S, Saha S, Nikhil N. Unsupervised gene selection using biological knowledge: application in sample clustering. BMC Bioinform. 2017;18(1):513. doi: 10.1186/s12859-017-1933-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Alirezanejad M, Enayatifar R, Motameni H, et al. Heuristic filter feature selection methods for medical datasets. Genomics. 2019 doi: 10.1016/j.ygeno.2019.07.002. [DOI] [PubMed] [Google Scholar]
4.Bolón-Canedo V, Rego-Fernández D, Peteiro-Barral D, Alonso-Betanzos A, Guijarro-Berdiñas B, Sánchez-Maroño N. On the scalability of feature selection methods on high-dimensional data. Knowl. Inf. Syst. 2017;56(2):395–442. doi: 10.1007/s10115-017-1140-3. [DOI] [Google Scholar]
5.Bommert A, Sun X, Bischl B, et al. Benchmark for filter methods for feature selection in high-dimensional classification data. CSDA. 2020;143:106839. [Google Scholar]
6.Cai, D., He, X., Hu, Y., et al.: Learning a spatially smooth subspace for face recognition. In: Proceedings of IEEE CVPR 2007, pp. 1–7 (2007)
7.Ding C, Peng H. Minimum redundancy feature selection from microarray gene expression data. JBCB. 2005;3(02):185–205. doi: 10.1142/s0219720005001004. [DOI] [PubMed] [Google Scholar]
8.Friedman, J., Hastie, T., Tibshirani, R.: A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736 (2010)
9.Greene, D., Cunningham, P.: Practical solutions to the problem of diagonal dominance in kernel document clustering. In: Proceedings of the 23rd ICML, pp. 377–384 (2006). 10.1145/1143844.1143892
10.Hancer E, Xue B, Zhang M. Differential evolution for filter feature selection based on information theory and feature ranking. Knowl.-Based Syst. 2018;140:103–119. doi: 10.1016/j.knosys.2017.10.028. [DOI] [Google Scholar]
11.Jiarpakdee, J., Tantithamthavorn, C., Treude, C.: Autospearman: Automatically mitigating correlated metrics for interpreting defect models. arXiv preprint arXiv:1806.09791 (2018)
12.Kong, D., Liu, J., Liu, B., et al.: Uncorrelated group lasso. In: AAAI, pp. 1765–1771 (2016)
13.Kononenko I. Estimating attributes: analysis and extensions of RELIEF. In: Bergadano F, De Raedt L, editors. Machine Learning: ECML-94; Heidelberg: Springer; 1994. pp. 171–182. [Google Scholar]
14.Lilleberg, J., Zhu, Y., Zhang, Y.: Support vector machines and word2vec for text classification with semantic features. In: Proceedings of the 14th IEEE ICCI* CC, pp. 136–140 (2015). 10.1109/ICCI-CC.2015.7259377
15.Nguyen, X.V., Chan, J., Romano, S., et al.: Effective global approaches for mutual information based feature selection. In: Proceedings of the 20th ACM SIGKDD, pp. 512–521 (2014). 10.1145/2623330.2623611
16.Uysal AK, Gunal S. A novel probabilistic feature selection method for text classification. Knowl.-Based Syst. 2012;36:226–235. doi: 10.1016/j.knosys.2012.06.005. [DOI] [Google Scholar]
17.Wang J, Wang M, Li P, et al. Online feature selection with group structure analysis. IEEE TKDE. 2015;27(11):3029–3041. [Google Scholar]
18.Yatish, S., Jiarpakdee, J., Thongtanunam, P., et al.: Mining software defects: should we consider affected releases? In: Proceedings of the 41st International Conference on Software Engineering, pp. 654–665. IEEE Press (2019)
19.Yu K, Wu X, Ding W, et al. Scalable and accurate online feature selection for big data. ACM TKDD. 2016;11(2):16. doi: 10.1145/2976744. [DOI] [Google Scholar]
20.Yu, L., Ding, C., Loscalzo, S.: Stable feature selection via dense feature groups. In: Proceedings of the 14th ACM SIGKDD, pp. 803–811 (2008). 10.1145/1401890.1401986

[CR1] 1.Cancer program datasets. http://portals.broadinstitute.org/cgi-bin/cancer/datasets.cgi. Accessed Nov 2019

[CR2] 2.Acharya S, Saha S, Nikhil N. Unsupervised gene selection using biological knowledge: application in sample clustering. BMC Bioinform. 2017;18(1):513. doi: 10.1186/s12859-017-1933-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] 3.Alirezanejad M, Enayatifar R, Motameni H, et al. Heuristic filter feature selection methods for medical datasets. Genomics. 2019 doi: 10.1016/j.ygeno.2019.07.002. [DOI] [PubMed] [Google Scholar]

[CR4] 4.Bolón-Canedo V, Rego-Fernández D, Peteiro-Barral D, Alonso-Betanzos A, Guijarro-Berdiñas B, Sánchez-Maroño N. On the scalability of feature selection methods on high-dimensional data. Knowl. Inf. Syst. 2017;56(2):395–442. doi: 10.1007/s10115-017-1140-3. [DOI] [Google Scholar]

[CR5] 5.Bommert A, Sun X, Bischl B, et al. Benchmark for filter methods for feature selection in high-dimensional classification data. CSDA. 2020;143:106839. [Google Scholar]

[CR6] 6.Cai, D., He, X., Hu, Y., et al.: Learning a spatially smooth subspace for face recognition. In: Proceedings of IEEE CVPR 2007, pp. 1–7 (2007)

[CR7] 7.Ding C, Peng H. Minimum redundancy feature selection from microarray gene expression data. JBCB. 2005;3(02):185–205. doi: 10.1142/s0219720005001004. [DOI] [PubMed] [Google Scholar]

[CR8] 8.Friedman, J., Hastie, T., Tibshirani, R.: A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736 (2010)

[CR9] 9.Greene, D., Cunningham, P.: Practical solutions to the problem of diagonal dominance in kernel document clustering. In: Proceedings of the 23rd ICML, pp. 377–384 (2006). 10.1145/1143844.1143892

[CR10] 10.Hancer E, Xue B, Zhang M. Differential evolution for filter feature selection based on information theory and feature ranking. Knowl.-Based Syst. 2018;140:103–119. doi: 10.1016/j.knosys.2017.10.028. [DOI] [Google Scholar]

[CR11] 11.Jiarpakdee, J., Tantithamthavorn, C., Treude, C.: Autospearman: Automatically mitigating correlated metrics for interpreting defect models. arXiv preprint arXiv:1806.09791 (2018)

[CR12] 12.Kong, D., Liu, J., Liu, B., et al.: Uncorrelated group lasso. In: AAAI, pp. 1765–1771 (2016)

[CR13] 13.Kononenko I. Estimating attributes: analysis and extensions of RELIEF. In: Bergadano F, De Raedt L, editors. Machine Learning: ECML-94; Heidelberg: Springer; 1994. pp. 171–182. [Google Scholar]

[CR14] 14.Lilleberg, J., Zhu, Y., Zhang, Y.: Support vector machines and word2vec for text classification with semantic features. In: Proceedings of the 14th IEEE ICCI* CC, pp. 136–140 (2015). 10.1109/ICCI-CC.2015.7259377

[CR15] 15.Nguyen, X.V., Chan, J., Romano, S., et al.: Effective global approaches for mutual information based feature selection. In: Proceedings of the 20th ACM SIGKDD, pp. 512–521 (2014). 10.1145/2623330.2623611

[CR16] 16.Uysal AK, Gunal S. A novel probabilistic feature selection method for text classification. Knowl.-Based Syst. 2012;36:226–235. doi: 10.1016/j.knosys.2012.06.005. [DOI] [Google Scholar]

[CR17] 17.Wang J, Wang M, Li P, et al. Online feature selection with group structure analysis. IEEE TKDE. 2015;27(11):3029–3041. [Google Scholar]

[CR18] 18.Yatish, S., Jiarpakdee, J., Thongtanunam, P., et al.: Mining software defects: should we consider affected releases? In: Proceedings of the 41st International Conference on Software Engineering, pp. 654–665. IEEE Press (2019)

[CR19] 19.Yu K, Wu X, Ding W, et al. Scalable and accurate online feature selection for big data. ACM TKDD. 2016;11(2):16. doi: 10.1145/2976744. [DOI] [Google Scholar]

[CR20] 20.Yu, L., Ding, C., Loscalzo, S.: Stable feature selection via dense feature groups. In: Proceedings of the 14th ACM SIGKDD, pp. 803–811 (2008). 10.1145/1401890.1401986

PERMALINK

A Framework for Feature Selection to Exploit Feature Group Structures

Kushani Perera

Jeffrey Chan

Shanika Karunasekera

Abstract

Introduction

Related Work

Preliminaries

Table 1.

Definition 1

Definition 2

Definition 3

Motivation and Background

Fig. 1.

Fig. 2.

Proposed Method: GroupMRMR

Feature Selection Objective

Definition 4

Theorem 1

Proof

Iterative Feature Selection

Theorem 2

Proof

Experiments

Table 2.

Table 3.

Fig. 3.

Fig. 4.

Conclusion

Acknowledgements

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A Framework for Feature Selection to Exploit Feature Group Structures

Kushani Perera

Jeffrey Chan

Shanika Karunasekera

Abstract

Introduction

Related Work

Preliminaries

Table 1.

Definition 1

Definition 2

Definition 3

Motivation and Background

Fig. 1.

Fig. 2.

Proposed Method: GroupMRMR

Feature Selection Objective

Definition 4

Theorem 1

Proof

Iterative Feature Selection

Theorem 2

Proof

Experiments

Table 2.

Table 3.

Fig. 3.

Fig. 4.

Conclusion

Acknowledgements

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases