Results 1  10
of
60
Wrappers for Feature Subset Selection
 AIJ SPECIAL ISSUE ON RELEVANCE
, 1997
"... In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a ..."
Abstract

Cited by 1522 (3 self)
 Add to MetaCart
In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. Our wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We study the strengths and weaknesses of the wrapper approach andshow a series of improved designs. We compare the wrapper approach to induction without feature subset selection and to Relief, a filter approach to feature subset selection. Significant improvement in accuracy is achieved for some datasets for the two families of induction algorithms used: decision trees and NaiveBayes.
Wrappers For Performance Enhancement And Oblivious Decision Graphs
, 1995
"... In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are stu ..."
Abstract

Cited by 122 (7 self)
 Add to MetaCart
In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are studied under the wrapper approach. The hypothesis spaces we investigate are: decision tables with a default majority rule (DTMs) and oblivious readonce decision graphs (OODGs).
Iterative Optimization and Simplification of Hierarchical Clusterings
 Journal of Artificial Intelligence Research
, 1995
"... Clustering is often used for discovering structure in data. Clustering systems differ in the objective function used to evaluate clustering quality and the control strategy used to search the space of clusterings. Ideally, the search strategy should consistently construct clusterings of high qual ..."
Abstract

Cited by 120 (3 self)
 Add to MetaCart
Clustering is often used for discovering structure in data. Clustering systems differ in the objective function used to evaluate clustering quality and the control strategy used to search the space of clusterings. Ideally, the search strategy should consistently construct clusterings of high quality, but be computationally inexpensive as well. In general, we cannot have it both ways, but we can partition the search so that a system inexpensively constructs a `tentative' clustering for initial examination, followed by iterative optimization, which continues to search in background for improved clusterings. Given this motivation, we evaluate an inexpensive strategy for creating initial clusterings, coupled with several control strategies for iterative optimization, each of which repeatedly modifies an initial clustering in search of a better one. One of these methods appears novel as an iterative optimization strategy in clustering contexts. Once a clustering has been construct...
Boosting Applied to Word Sense Disambiguation
 IN PROCEEDINGS OF THE 12TH EUROPEAN CONFERENCE ON MACHINE LEARNING
, 2000
"... In this paper Schapire and Singer's AdaBoost.MH boosting algorithm is applied to the Word Sense Disambiguation (WSD) problem. Initial experiments on a set of 15 selected polysemous words show that the boosting approach surpasses Naive Bayes and Exemplarbased approaches, which represent stat ..."
Abstract

Cited by 66 (9 self)
 Add to MetaCart
In this paper Schapire and Singer's AdaBoost.MH boosting algorithm is applied to the Word Sense Disambiguation (WSD) problem. Initial experiments on a set of 15 selected polysemous words show that the boosting approach surpasses Naive Bayes and Exemplarbased approaches, which represent stateoftheart accuracy on supervised WSD. In order to make boosting practical for a real learning domain of thousands of words, several ways of accelerating the algorithm by reducing the feature space are studied. The best variant, which we call LazyBoosting, is tested on the largest sensetagged corpus available containing 192,800 examples of the 191 most frequent and ambiguous English words. Again, boosting compares favourably to the other benchmark algorithms.
General and Efficient Multisplitting of Numerical Attributes
, 1999
"... . Often in supervised learning numerical attributes require special treatment and do not fit the learning scheme as well as one could hope. Nevertheless, they are common in practical tasks and, therefore, need to be taken into account. We characterize the wellbehavedness of an evaluation function, ..."
Abstract

Cited by 53 (7 self)
 Add to MetaCart
. Often in supervised learning numerical attributes require special treatment and do not fit the learning scheme as well as one could hope. Nevertheless, they are common in practical tasks and, therefore, need to be taken into account. We characterize the wellbehavedness of an evaluation function, a property that guarantees the optimal multipartition of an arbitrary numerical domain to be defined on boundary points. Wellbehavedness reduces the number of candidate cut points that need to be examined in multisplitting numerical attributes. Many commonly used attribute evaluation functions possess this property; we demonstrate that the cumulative functions Information Gain and Training Set Error as well as the noncumulative functions Gain Ratio and Normalized Distance Measure are all wellbehaved. We also devise a method of finding optimal multisplits efficiently by examining the minimum number of boundary point combinations that is required to produce partitions which are optimal wit...
An Improved Algorithm for Incremental Induction of Decision Trees
 In Proceedings of the Eleventh International Conference on Machine Learning
, 1994
"... This paper presents an algorithm for incremental induction of decision trees that is able to handle both numeric and symbolic variables. In order to handle numeric variables, a new tree revision operator called `slewing' is introduced. Finally, a nonincremental method is given for finding a de ..."
Abstract

Cited by 50 (4 self)
 Add to MetaCart
This paper presents an algorithm for incremental induction of decision trees that is able to handle both numeric and symbolic variables. In order to handle numeric variables, a new tree revision operator called `slewing' is introduced. Finally, a nonincremental method is given for finding a decision tree based on a direct metric of a candidate tree. Contents 1 Introduction 1 2 Design Goals 1 3 An Improved Algorithm 2 3.1 Incorporating a Training Instance : : : : : : : : : : : : : : : : : : : : : : : : 2 3.2 Ensuring a Best Test at Each Decision Node : : : : : : : : : : : : : : : : : : 3 3.3 Information Kept at a Decision Node : : : : : : : : : : : : : : : : : : : : : : 3 3.4 Tree Transposition : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 3.5 Slewing a Cutpoint : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 3.6 How to Ensure a Best Test Everywhere : : : : : : : : : : : : : : : : : : : : : 5 4 Incremental Training Cost 5 5 ErrorCorrection Mo...
Efficient Incremental Induction of Decision Trees
, 1995
"... This paper proposes a method to improve ID5R, an incremental TDIDT algorithm. The new method evaluates the quality of attributes selected at the nodes of a decision tree and estimates a minimum number of steps for which these attributes are guaranteed such a selection. This results in reducing overh ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
This paper proposes a method to improve ID5R, an incremental TDIDT algorithm. The new method evaluates the quality of attributes selected at the nodes of a decision tree and estimates a minimum number of steps for which these attributes are guaranteed such a selection. This results in reducing overheads during incremental learning. The method is supported by theoretical analysis and experimental results. Keywords: Incremental algorithm, decision tree induction 1.
A KolmogorovSmirnoff Metric for Decision Tree Induction
, 1996
"... In 1977, Friedman demonstrated that KolmogorovSmirnoff distance could be employed effectively as a test selection metric for decision tree induction. We revisit this metric and modify it to handle multiple classes within a single tree, and to be sensitive to missing data values. Empirical results f ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
In 1977, Friedman demonstrated that KolmogorovSmirnoff distance could be employed effectively as a test selection metric for decision tree induction. We revisit this metric and modify it to handle multiple classes within a single tree, and to be sensitive to missing data values. Empirical results for a large sample of learning tasks, comparing this metric to the gain ratio metric, show a highly significant reduction in tree size and expected number of tests for classification, without a significant change in classification accuracy. 1 Introduction Topdown induction of decision trees is driven by greedy selection of a partition of the training instances that maximizes a heuristic function of that partition. The heuristic function is often called the test selection metric, but it is also known as the splitting criterion, the attribute selection metric, and the partition merit function. A good test selection metric should have a higher value for a better partition, but whether one pa...
Theoretical Comparison between the Gini Index and Information Gain Criteria
 Annals of Mathematics and Artificial Intelligence
, 2000
"... Knowledge Discovery in Databases (KDD) is an active and important research area with the promise for a high payoff in many business and scientific applications. One of the main tasks in KDD is classification. A particular efficient method for classification is decision tree induction. The selectio ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
(Show Context)
Knowledge Discovery in Databases (KDD) is an active and important research area with the promise for a high payoff in many business and scientific applications. One of the main tasks in KDD is classification. A particular efficient method for classification is decision tree induction. The selection of the attribute used at each node of the tree to split the data (split criterion) is crucial in order to correctly classify objects. Different split criteria were proposed in the literature (Information Gain, Gini Index, etc.). It is not obvious which of them will produce the best decision tree for a given data set. A large amount of empirical tests were conducted in order to answer this question. No conclusive results were found.
A Machine Learning Approach to POS Tagging
, 1998
"... We have applied the inductive learning of statistical decision trees and relaxation labelling to the Natural Language Processing (nlp) task of morphosyntactic disambiguation (Part Of Speech Tagging). The learning process is supervised and obtains a language model oriented to resolve pos ambiguities, ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
We have applied the inductive learning of statistical decision trees and relaxation labelling to the Natural Language Processing (nlp) task of morphosyntactic disambiguation (Part Of Speech Tagging). The learning process is supervised and obtains a language model oriented to resolve pos ambiguities, consisting of a set of statistical decision trees expressing distribution of tags and words in some relevant contexts. The acquired decision trees have been directly used in a tagger that is both relatively simple and fast, and which has been tested and evaluated on the Wall Street Journal (wsj) corpus with remarkable accuracy. However, better results can be obtained by translating the trees into rules to feed a flexible relaxation labelling based tagger. In this direction we describe a tagger which is able to use information of any kind (ngrams, automatically acquired constraints, linguistically motivated manually written constraints, etc.), and in particular to incorporate the machine...