Algorithm 1. The Auto-Filter procedure. It receives a set of candidate filter methods S, a set of candidate k values K, the training dataset of the external cross-validation, a classification algorithm (in this work, random forest) to be used in the internal comparisons of the candidate filters, and a target predictive performance metric (in this work, the AUC). It returns the candidate filter and k value combination with the largest average score. Note that in this pseudocode the term ‘filter method’ is being used in a generic way, it can denote either a single filter method or its counterpart filter ensemble.
1: function Auto-Filter(S, K, training_set, classifier, perf_metric) 2: candidate_filters = [ ] 3: internal_CV = createStratifiedCrossValidationSets(training_set,5) 4: For each filter in S: 5: For each k in K: 6: For each estimation_set, validation_set in internal_CV: 7: filter.calculateFeatureScores(estimation_set) 8: estimation_set.applyFilter(filter, k) 9: validation_set.applyFilter(filter, k) 10: c = trainClassifier(estimation_set, classifier) 11: score = c.Evaluate(validation_set, perf metric) 11: score_array.add(score) 12: candidate_filters.append([filter, k, score_array]) 13: return selectBestFilter(candidate_filters) |