Results 1  10
of
64
Wrappers for feature subset selection
 ARTIFICIAL INTELLIGENCE
, 1997
"... In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a ..."
Abstract

Cited by 1036 (3 self)
 Add to MetaCart
In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. Our wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We study the strengths and weaknesses of the wrapper approach and show a series of improved designs. We compare the wrapper approach to induction without feature subset selection and to Relief, a filter approach to feature subset selection. Significant improvement in accuracy is achieved for some datasets for the two families of induction algorithms used: decision trees and
Improved heterogeneous distance functions
 Journal of Artificial Intelligence Research
, 1997
"... Instancebased learning techniques typically handle continuous and linear input values well, but often do not handle nominal input attributes appropriately. The Value Difference Metric (VDM) was designed to find reasonable distance values between nominal attribute values, but it largely ignores cont ..."
Abstract

Cited by 202 (10 self)
 Add to MetaCart
Instancebased learning techniques typically handle continuous and linear input values well, but often do not handle nominal input attributes appropriately. The Value Difference Metric (VDM) was designed to find reasonable distance values between nominal attribute values, but it largely ignores continuous attributes, requiring discretization to map continuous values into nominal values. This paper proposes three new heterogeneous distance functions, called the Heterogeneous Value Difference Metric (HVDM), the Interpolated Value Difference Metric (IVDM), and the Windowed Value Difference Metric (WVDM). These new distance functions are designed to handle applications with nominal attributes, continuous attributes, or both. In experiments on 48 applications the new distance metrics achieve higher classification accuracy on average than three previous distance functions on those datasets that have both nominal and continuous attributes.
Costsensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm
 JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 1995
"... This paper introduces ICET, a new algorithm for costsensitive classification. ICET uses a genetic algorithm to evolve a population of biases for a decision tree induction algorithm. The fitness ..."
Abstract

Cited by 155 (5 self)
 Add to MetaCart
This paper introduces ICET, a new algorithm for costsensitive classification. ICET uses a genetic algorithm to evolve a population of biases for a decision tree induction algorithm. The fitness
Comparative Experiments on Disambiguating Word Senses: An Illustration of the Role of Bias in Machine Learning
, 1996
"... This paper describes an experimental comparison of seven different learning algorithms on the problem of learning to disambiguate the meaning of a word from context. The algorithms tested include statistical, neuralnetwork, decisiontree, rulebased, and casebased classification techniques. The sp ..."
Abstract

Cited by 107 (1 self)
 Add to MetaCart
This paper describes an experimental comparison of seven different learning algorithms on the problem of learning to disambiguate the meaning of a word from context. The algorithms tested include statistical, neuralnetwork, decisiontree, rulebased, and casebased classification techniques. The specific problem tested involves disambiguating six senses of the word "line" using the words in the current and proceeding sentence as context. The statistical and neuralnetwork methods perform the best on this particular problem and we discuss a potential reason for this ob served difference. We also discuss the role of bias in machine ]earning and its importance in explaining performance differences observed on specific problems.
Addressing the Selective Superiority Problem: Automatic Algorithm/Model Class Selection
, 1993
"... The results of empirical comparisons of existing learning algorithms illustrate that each algorithm has a selective superiority; it is best for some but not all tasks. Given a data set, it is often not clear beforehand which algorithm will yield the best performance. In such cases one must search th ..."
Abstract

Cited by 63 (2 self)
 Add to MetaCart
The results of empirical comparisons of existing learning algorithms illustrate that each algorithm has a selective superiority; it is best for some but not all tasks. Given a data set, it is often not clear beforehand which algorithm will yield the best performance. In such cases one must search the space of available algorithms to find the one that produces the best classifier. In this paper we present an approach that applies knowledge about the representational biases of a set of learning algorithms to conduct this search automatically. In addition, the approach permits the available algorithms' model classes to be mixed in a recursive treestructured hybrid. We describe an implementation of the approach, MCS, that performs a heuristic bestfirst search for the best hybrid classifier for a set of data. An empirical comparison of MCS to each of its primitive learning algorithms, and to the computationally intensive method of crossvalidation, illustrates that automatic selection of l...
Feature Subset Selection Using the Wrapper Method: Overfitting and Dynamic Search Space Topology
, 1995
"... In the wrapper approach to feature subset selection, a search for an optimal set of features is made using the induction algorithm as a blackbox. The estimated future performance of the algorithm is the heuristic guiding the search. Statistical methods for feature subset selection including for ..."
Abstract

Cited by 52 (0 self)
 Add to MetaCart
In the wrapper approach to feature subset selection, a search for an optimal set of features is made using the induction algorithm as a blackbox. The estimated future performance of the algorithm is the heuristic guiding the search. Statistical methods for feature subset selection including forward selection, backward elimination, and their stepwise variants can be viewed as simple hillclimbing techniques in the space of feature subsets. We utilize bestfirst searchtofindagood feature subset and discuss overfitting problems that may be associated with searching too many feature subsets. We introduce compound operators that dynamically change the topology of the search space to better utilize the information available from the evaluation of feature subsets. We show that compound operators unify previous approaches that deal with relevant and irrelevant features. The improved feature subset selection yields significant improvements for realworld datasets when using the ...
Automatic Parameter Selection by Minimizing Estimated Error
 In Proceedings of the Twelfth International Conference on Machine Learning
, 1995
"... We address the problem of finding the parameter settings that will result in optimal performance of a given learning algorithm using a particular dataset as training data. We describe a "wrapper" method, considering determination of the best parameters as a discrete function optimization p ..."
Abstract

Cited by 48 (4 self)
 Add to MetaCart
We address the problem of finding the parameter settings that will result in optimal performance of a given learning algorithm using a particular dataset as training data. We describe a "wrapper" method, considering determination of the best parameters as a discrete function optimization problem. The method uses bestfirst search and crossvalidation to wrap around the basic induction algorithm: the search explores the space of parameter values, running the basic algorithm many times on training and holdout sets produced by crossvalidation to get an estimate of the expected error of each parameter setting. Thus, the final selected parameter settings are tuned for the specific induction algorithm and dataset being studied. We report experiments with this method on 33 datasets selected from the UCI and StatLog collections using C4.5 as the basic induction algorithm. At a 90% confidence level, our method improves the performance of C4.5 on nine domains, degrades performance on one, and is...
Active Learning with Multiple Views
, 2002
"... Active learners alleviate the burden of labeling large amounts of data by detecting and asking the user to label only the most informative examples in the domain. We focus here on active learning for multiview domains, in which there are several disjoint subsets of features (views), each of which i ..."
Abstract

Cited by 41 (1 self)
 Add to MetaCart
Active learners alleviate the burden of labeling large amounts of data by detecting and asking the user to label only the most informative examples in the domain. We focus here on active learning for multiview domains, in which there are several disjoint subsets of features (views), each of which is sufficient to learn the target concept. In this paper we make several contributions. First, we introduce CoTesting, which is the first approach to multiview active learning. Second, we extend the multiview learning framework by also exploiting weak views, which are adequate only for learning a concept that is more general/specific than the target concept. Finally, we empirically show that CoTesting outperforms existing active learners on a variety of real world domains such as wrapper induction, Web page classification, advertisement removal, and discourse tree parsing. 1.
Local Cascade Generalization
, 1998
"... In a previous work we have presented Cascade Generalization, a new general method for merging classifiers. The basic idea of Cascade Generalization is to sequentially run the set of classifiers, at each step performing an extension of the original data by the insertion of new attributes. The new att ..."
Abstract

Cited by 39 (1 self)
 Add to MetaCart
In a previous work we have presented Cascade Generalization, a new general method for merging classifiers. The basic idea of Cascade Generalization is to sequentially run the set of classifiers, at each step performing an extension of the original data by the insertion of new attributes. The new attributes are derived from the probability class distribution given by a base classifier. This constructive step extends the representational language for the high level classifiers, relaxing their bias. In this paper we extend this work by applying Cascade locally. At each iteration of a divide and conquer algorithm, a reconstruction of the instance space occurs by the addition of new attributes. Each new attribute represents the probability that an example belongs to a class given by a base classifier. We have implemented three Local Generalization Algorithms. The first merges a linear discriminant with a decision tree, the second merges a naive Bayes with a decision tree, and the third mer...
Connectionist theory refinement: Genetically searching the space of network topologies
 Journal of Artificial Intelligence Research
, 1997
"... An algorithm that learns from a set of examples should ideally be able to exploit the available resources of (a) abundant computing power and (b) domainspecific knowledge to improve its ability to generalize. Connectionist theoryrefinement systems, which use background knowledge to select a neural ..."
Abstract

Cited by 29 (1 self)
 Add to MetaCart
An algorithm that learns from a set of examples should ideally be able to exploit the available resources of (a) abundant computing power and (b) domainspecific knowledge to improve its ability to generalize. Connectionist theoryrefinement systems, which use background knowledge to select a neural network's topology and initial weights, have proven to be effective at exploiting domainspecific knowledge; however, most do not exploit available computing power. This weakness occurs because they lack the ability to refine the topology of the neural networks they produce, thereby limiting generalization, especially when given impoverished domain theories. We present the REGENT algorithm which uses (a) domainspecific knowledge to help create an initial population of knowledgebased neural networks and (b) genetic operators of crossover and mutation (specifically designed for knowledgebased networks) to continually search for better network topologies. Experiments on three realworld domains indicate that our new algorithm is able to significantly increase generalization compared to a standard connectionist theoryrefinement system, as well as our previous algorithm for growing knowledgebased networks.