Results 1  10
of
101
Wrappers for Feature Subset Selection
 AIJ SPECIAL ISSUE ON RELEVANCE
, 1997
"... In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a ..."
Abstract

Cited by 1133 (3 self)
 Add to MetaCart
In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. Our wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We study the strengths and weaknesses of the wrapper approach andshow a series of improved designs. We compare the wrapper approach to induction without feature subset selection and to Relief, a filter approach to feature subset selection. Significant improvement in accuracy is achieved for some datasets for the two families of induction algorithms used: decision trees and NaiveBayes.
Multivariate Decision Trees
, 1992
"... Multivariate decision trees overcome a representational limitation of univariate decision trees: univariate decision trees are restricted to splits of the instance space that are orthogonal to the feature's axis. This paper discusses the following issues for constructing multivariate decision t ..."
Abstract

Cited by 123 (6 self)
 Add to MetaCart
Multivariate decision trees overcome a representational limitation of univariate decision trees: univariate decision trees are restricted to splits of the instance space that are orthogonal to the feature's axis. This paper discusses the following issues for constructing multivariate decision trees: representing a multivariate test, including symbolic and numeric features, learning the coefficients of a multivariate test, selecting the features to include in a test, and pruning of multivariate decision trees. We present some new and review some wellknown methods for forming multivariate decision trees. The methods are compared across a variety of learning tasks to assess each method's ability to find concise, accurate decision trees. The results demonstrate that some multivariate methods are more effective than others. In addition, the experiments confirm that allowing multivariate tests improves the accuracy of the resulting decision tree over univariate trees. Contents 1 Introduc...
Wrappers For Performance Enhancement And Oblivious Decision Graphs
, 1995
"... In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are stu ..."
Abstract

Cited by 111 (7 self)
 Add to MetaCart
In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are studied under the wrapper approach. The hypothesis spaces we investigate are: decision tables with a default majority rule (DTMs) and oblivious readonce decision graphs (OODGs).
Multiple Comparisons in Induction Algorithms
 Machine Learning
, 1998
"... Keywords Running Head multiple comparison procedure Multiple Comparisons in Induction Algorithms David Jensen and Paul R. Cohen Experimental Knowledge Systems Laboratory Department of Computer Science Box 34610 LGRC University of Massachusetts Amherst, MA 010034610 4135453613 A single ..."
Abstract

Cited by 82 (10 self)
 Add to MetaCart
Keywords Running Head multiple comparison procedure Multiple Comparisons in Induction Algorithms David Jensen and Paul R. Cohen Experimental Knowledge Systems Laboratory Department of Computer Science Box 34610 LGRC University of Massachusetts Amherst, MA 010034610 4135453613 A single mechanism is responsible for three pathologies of induction algorithms: attribute selection errors, overfitting, and oversearching. In each pathology, induction algorithms compare multiple items based on scores from an evaluation function and select the item with the maximum score. We call this a ( ). We analyze the statistical properties of and show how failure to adjust for these properties leads to the pathologies. We also discuss approaches that can control pathological behavior, including Bonferroni adjustment, randomization testing, and crossvalidation. Inductive learning, overfitting, oversearching, attribute selection, hypothesis testing, parameter estimation Multiple Com...
Efficiently Inducing Determinations: A Complete and Systematic Search Algorithm that Uses Optimal Pruning
 In Proceedings of the Tenth International Conference on Machine Learning
, 1993
"... Determinations are a useful type of functional knowledge representation. Applications include knowledgebased systems, analogical reasoning, database design, and robotic sensing systems. This paper presents an efficient, batch algorithm for inducing all minimal determinations from observed data. The ..."
Abstract

Cited by 68 (1 self)
 Add to MetaCart
Determinations are a useful type of functional knowledge representation. Applications include knowledgebased systems, analogical reasoning, database design, and robotic sensing systems. This paper presents an efficient, batch algorithm for inducing all minimal determinations from observed data. The algorithm is based on breadthfirst search and runs in polynomial time and space given a usersupplied parameter limiting the maximum size of a determination. The algorithm uses probabilistic measures to induce determinations despite noisy data. One key contribution is the identification of an enumeration order in the space of possible determinations that affords a complete and systematic search. Another contribution lists axioms that relate neighboring states and allow the construction of pruning rules. A third contribution formulates a perfect hash function for states in this space and facilitates optimal use of the pruning rules. This paper also sketches an algorithm that can incremental...
Exploring the decision forest: An empirical investigation of Occamâ€™s razor in decision tree induction
 Journal of Artificial Intelligence Research
, 1994
"... We report on a series of experiments in which all decision trees consistent with the training data are constructed. These experiments were run to gain an understanding of the properties of the set of consistent decision trees and the factors that a ect the accuracy of individual trees. In particular ..."
Abstract

Cited by 61 (3 self)
 Add to MetaCart
(Show Context)
We report on a series of experiments in which all decision trees consistent with the training data are constructed. These experiments were run to gain an understanding of the properties of the set of consistent decision trees and the factors that a ect the accuracy of individual trees. In particular, we investigated the relationship between the size of a decision tree consistent with some training data and the accuracy of the tree on test data. The experiments were performed on a massively parallel Maspar computer. The results of the experiments on several arti cial and two real world problems indicate that, for many of the problems investigated, smaller consistent decision trees are on average less accurate than the average accuracy of slightly larger trees. 1.
Use of Contextual Information for Feature Ranking and Discretization
, 1997
"... Deriving classification rules or decision trees from examples is an important problem. When there are too many features, discarding weak features before the derivation process is highly desirable. When there are numeric features, they need to be discretized for the rule generation. We present a ne ..."
Abstract

Cited by 51 (9 self)
 Add to MetaCart
Deriving classification rules or decision trees from examples is an important problem. When there are too many features, discarding weak features before the derivation process is highly desirable. When there are numeric features, they need to be discretized for the rule generation. We present a new approach to these problems. Traditional techniques make use of feature merits based on either the information theoretic or statistical correlation between each feature and the class. We instead assign merits to features by finding each feature's "obligation" to the class discrimination in the context of other features. The merits are then used to rank the features, select a feature subset, and to discretize the numeric variables. Experience with benchmark example sets demonstrates that the new approachisapowerful alternative to the traditional methods. This paper concludes by posing some new technical issues that arise from this approach.
General and Efficient Multisplitting of Numerical Attributes
, 1999
"... . Often in supervised learning numerical attributes require special treatment and do not fit the learning scheme as well as one could hope. Nevertheless, they are common in practical tasks and, therefore, need to be taken into account. We characterize the wellbehavedness of an evaluation function, ..."
Abstract

Cited by 47 (7 self)
 Add to MetaCart
. Often in supervised learning numerical attributes require special treatment and do not fit the learning scheme as well as one could hope. Nevertheless, they are common in practical tasks and, therefore, need to be taken into account. We characterize the wellbehavedness of an evaluation function, a property that guarantees the optimal multipartition of an arbitrary numerical domain to be defined on boundary points. Wellbehavedness reduces the number of candidate cut points that need to be examined in multisplitting numerical attributes. Many commonly used attribute evaluation functions possess this property; we demonstrate that the cumulative functions Information Gain and Training Set Error as well as the noncumulative functions Gain Ratio and Normalized Distance Measure are all wellbehaved. We also devise a method of finding optimal multisplits efficiently by examining the minimum number of boundary point combinations that is required to produce partitions which are optimal wit...
An Efficient TwoStep Method for Classification of Spatial Data
"... Spatial data mining, i.e., discovery of interesting, implicit knowledge in spatial databases, is a highly demanding field because very large amounts of spatial data have been collected in various applications, ranging from remote sensing, to geographical information systems (GIS), computer cartograp ..."
Abstract

Cited by 43 (1 self)
 Add to MetaCart
(Show Context)
Spatial data mining, i.e., discovery of interesting, implicit knowledge in spatial databases, is a highly demanding field because very large amounts of spatial data have been collected in various applications, ranging from remote sensing, to geographical information systems (GIS), computer cartography, environmental assessment and planning, etc. In this paper, an efficient method for building decision trees for the classification of objects stored in geographic information databases is proposed and studied. Our approach to spatial classification is based on both (1) nonspatial properties of the classified objects and (2) attributes, pred icates and functions describing spatial relations between classified objects and other features located in the spatial proximity of the classified objects. Several optimization techniques are explored, including a twostep spatial computation technique, use of spatialjoin indices, etc. We implemented the algorithm and conducted experiments that showed the effectiveness of the proposed method.
Rulebased Machine Learning Methods for Functional Prediction
 Journal of Artificial Intelligence Research
, 1995
"... We describe a machine learning method for predicting the value of a realvalued function, given the values of multiple input variables. The method induces solutions from samples in the form of ordered disjunctive normal form (DNF) decision rules. A central objective of the method and representation ..."
Abstract

Cited by 42 (3 self)
 Add to MetaCart
We describe a machine learning method for predicting the value of a realvalued function, given the values of multiple input variables. The method induces solutions from samples in the form of ordered disjunctive normal form (DNF) decision rules. A central objective of the method and representation is the induction of compact, easily interpretable solutions. This rulebased decision model can be extended to search efficiently for similar cases prior to approximating function values. Experimental results on realworld data demonstrate that the new techniques are competitive with existing machine learning and statistical methods and can sometimes yield superior regression performance. 1. Introduction The problem of approximating the values of a continuous variable is described in the statistical literature as regression. Given samples of output (response) variable y and input (predictor) variables x = fx 1 :::x n g, the regression task is to find a mapping y = f(x). Relative to the spac...