In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. Our wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We study the strengths and weaknesses of the wrapper approach and show a series of improved designs. We compare the wrapper approach to induction without feature subset selection and to Relief, a filter approach to feature subset selection. Significant improvement in accuracy is achieved for some datasets for the two families of induction algorithms used: decision trees and Naive-Bayes. 1 Introduction A univers...
|
5180
|
Genetic Algorithms
– Goldberg
- 1989
|
|
3011
|
Pattern Classification and Scene Analysis
– Duda, Hart
- 1973
|
|
2573
|
Classification and Regression Trees
– Breiman, Friedman, et al.
- 1984
|
|
2526
|
Induction of decision trees
– Quinlan
- 1986
|
|
2227
|
UCI repository of machine learning databases
– Blake, Merz
|
|
2210
|
Artificial Intelligence: A Modern Approach
– Russell, Norvig
- 1995
|
|
1921
|
Genetic Programming I : On the Programming of Computers by Means of Natural Selection
– Koza
- 1992
|
|
1565
|
Bagging predictors
– Breiman
- 1996
|
|
1205
|
Schapire, “Decision-theoretic generalization of on-line learning and application to boosting
– Freund, E
- 1997
|
|
787
|
Instance-based Learning Algorithms
– Aha, Kibler, et al.
- 1991
|
|
781
|
Probability inequalities for sums of bounded random variables
– Hoeffding
- 1963
|
|
638
|
UCI repository of machine learning databases. For information contact ml-repository@ics.uci.edu
– Murphy, Aha
- 1994
|
|
538
|
C4.5: Programs for
– Quinlan
- 1993
|
|
508
|
Neural networks and the bias/variance dilemma
– Geman, Bienenstock, et al.
- 1992
|
|
499
|
Learning quickly when irrelevant attributes abound: A new linearthreshold algorithm
– Littlestone
- 1988
|
|
477
|
Irrelevant features and the subset selection problem
– John, Kohavi, et al.
- 1994
|
|
457
|
The strength of weak learnability
– Schapire
- 1990
|
|
438
|
The weighted majority algorithm
– Littlestone, Warmuth
- 1994
|
|
427
|
The perceptron: A probabilistic model for information storage and organization in the brain
– Rosenblatt
- 1958
|
|
412
|
Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology
– Holland
- 1992
|
|
392
|
Perceptrons: an introduction computational geomery
– Minsky, Papert
- 1969
|
|
366
|
A study of cross-validation and bootstrap for accuracy estimation and model selection
– Kohavi
- 1995
|
|
322
|
Neural network ensembles, cross validation, and active learning
– Krogh, Vedelsby
- 1995
|
|
304
|
Supervised and unsupervised discretization of continuous features
– Dougherty, Kohavi, et al.
- 1995
|
|
294
|
Boosting a Weak Learning Algorithm by Majority
– Freund
- 1995
|
|
265
|
Rough sets
– Pawlak
- 1982
|
|
247
|
Learning in embedded systems
– Kaelbling
- 1993
|
|
242
|
An analysis of Bayesian classifiers
– Langley, Iba, et al.
- 1992
|
|
234
|
Beyond independence: Conditions for the optimality of the simple Bayesian classifier
– Domingos, Pazzani
- 1996
|
|
228
|
How to use expert advice
– Cesa-Bianchi, Freund, et al.
- 1997
|
|
217
|
L.: A practical approach to feature selection
– Kira, Rendell
- 1992
|
|
202
|
Solving time-dependent planning problems
– Boddy, Dean
- 1989
|
|
196
|
Estimating attributes: Analysis and extensions of Relief
– Kononenko
- 1994
|
|
180
|
The feature selection problem: traditional methods and new algorithm
– Kira, Rendell
- 1992
|
|
175
|
Training a 3-node neural network is NPcomplete
– Blum
|
|
169
|
Learning with many irrelevant features
– Almuallim, Dietterich
- 1991
|
|
163
|
Greedy attribute selection
– Caruana, Freitag
- 1994
|
|
163
|
Induction of selective Bayesian classifiers
– Langley, Sage
- 1994
|
|
157
|
Models of incremental concept formation
– GENNARI, LANGLEY, et al.
- 1989
|
|
150
|
The MONK's problems - a performance comparison of different learning algorithm
– Thrun
- 1991
|
|
139
|
A branch and bound algorithm for feature subset selection
– Narendra, Fukunaga
- 1977
|
|
134
|
Estimating probabilities: A crucial task in machine learning
– Cestnik
- 1990
|
|
131
|
Bias plus variance decomposition for zeroone loss functions
– Kohavi, Wolpert
- 1996
|
|
130
|
Constructing optimal binary decision trees is NP-complete
– l, Rivest
- 1976
|
|
128
|
Data mining using MLC++: A machine learning library
– Kohavi, eld, et al.
- 1996
|
|
121
|
Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms
– Aha
- 1992
|
|
116
|
Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms
– Skalak
- 1994
|
|
115
|
The Estimation of Probabilities: An Essay on Modern Bayesian Methods
– Good
- 1965
|
|
105
|
Ecient Algorithms for Minimizing Cross Validation Error
– Moore, Lee
- 1994
|
|
96
|
Learning Classification Trees
– Buntine
- 1992
|