Results 1  10
of
127
Wrappers for Feature Subset Selection
 AIJ SPECIAL ISSUE ON RELEVANCE
, 1997
"... In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a ..."
Abstract

Cited by 1133 (3 self)
 Add to MetaCart
In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. Our wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We study the strengths and weaknesses of the wrapper approach andshow a series of improved designs. We compare the wrapper approach to induction without feature subset selection and to Relief, a filter approach to feature subset selection. Significant improvement in accuracy is achieved for some datasets for the two families of induction algorithms used: decision trees and NaiveBayes.
Automatic Construction of Decision Trees from Data: A MultiDisciplinary Survey
 Data Mining and Knowledge Discovery
, 1997
"... Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial ne ..."
Abstract

Cited by 164 (1 self)
 Add to MetaCart
(Show Context)
Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial neural networks. Researchers in these disciplines, sometimes working on quite different problems, identified similar issues and heuristics for decision tree construction. This paper surveys existing work on decision tree construction, attempting to identify the important issues involved, directions the work has taken and the current state of the art. Keywords: classification, treestructured classifiers, data compaction 1. Introduction Advances in data collection methods, storage and processing technology are providing a unique challenge and opportunity for automated data exploration techniques. Enormous amounts of data are being collected daily from major scientific projects e.g., Human Genome...
Stochastic Dynamic Programming with Factored Representations
, 1997
"... Markov decision processes(MDPs) have proven to be popular models for decisiontheoretic planning, but standard dynamic programming algorithms for solving MDPs rely on explicit, statebased specifications and computations. To alleviate the combinatorial problems associated with such methods, we propo ..."
Abstract

Cited by 158 (10 self)
 Add to MetaCart
(Show Context)
Markov decision processes(MDPs) have proven to be popular models for decisiontheoretic planning, but standard dynamic programming algorithms for solving MDPs rely on explicit, statebased specifications and computations. To alleviate the combinatorial problems associated with such methods, we propose new representational and computational techniques for MDPs that exploit certain types of problem structure. We use dynamic Bayesian networks (with decision trees representing the local families of conditional probability distributions) to represent stochastic actions in an MDP, together with a decisiontree representation of rewards. Based on this representation, we develop versions of standard dynamic programming algorithms that directly manipulate decisiontree representations of policies and value functions. This generally obviates the need for statebystate computation, aggregating states at the leaves of these trees and requiring computations only for each aggregate state. The key to these algorithms is a decisiontheoretic generalization of classic regression analysis, in which we determine the features relevant to predicting expected value. We demonstrate the method empirically on several planning problems,
Relational Reinforcement Learning
, 2001
"... Relational reinforcement learning is presented, a learning technique that combines reinforcement learning with relational learning or inductive logic programming. Due to the use of a more expressive representation language to represent states, actions and Qfunctions, relational reinforcement learni ..."
Abstract

Cited by 116 (9 self)
 Add to MetaCart
Relational reinforcement learning is presented, a learning technique that combines reinforcement learning with relational learning or inductive logic programming. Due to the use of a more expressive representation language to represent states, actions and Qfunctions, relational reinforcement learning can be potentially applied to a new range of learning tasks. One such task that we investigate is planning in the blocks world, where it is assumed that the effects of the actions are unknown to the agent and the agent has to learn a policy. Within this simple domain we show that relational reinforcement learning solves some existing problems with reinforcement from specific goals pursued and to exploit the results of previous learning phases when addressing new (more complex) situations.
BOAT  Optimistic Decision Tree Construction
, 1999
"... Classification is an important data mining problem. Given a training database of records, each tagged with a class label, the goal of classification is to build a concise model that can be used to predict the class label of future, unlabeled records. A very popular class of classifiers are decision ..."
Abstract

Cited by 105 (2 self)
 Add to MetaCart
(Show Context)
Classification is an important data mining problem. Given a training database of records, each tagged with a class label, the goal of classification is to build a concise model that can be used to predict the class label of future, unlabeled records. A very popular class of classifiers are decision trees. All current algorithms to construct decision trees, including all mainmemory algorithms, make one scan over the training database per level of the tree. We introduce a new algorithm (BOAT) for decision tree construction that improves upon earlier algorithms in both performance and functionality. BOAT constructs several levels of the tree in only two scans over the training database, resulting in an average performance gain of 300% over previous work. The key to this performance improvement is a novel optimistic approach to tree construction in which we construct an initial tree using a small subset of the data and refine it to arrive at the final tree. We guarantee that any differen...
Dynamic Weighted Majority: A New Ensemble Method for Tracking Concept Drift
, 2003
"... Algorithms for tracking concept drift are important for many applications. We present a general method based on the Weighted Majority algorithm for using any online learner for concept drift. Dynamic Weighted Majority (DWM) maintains an ensemble of base learners, predicts using a weightedmajority ..."
Abstract

Cited by 61 (0 self)
 Add to MetaCart
(Show Context)
Algorithms for tracking concept drift are important for many applications. We present a general method based on the Weighted Majority algorithm for using any online learner for concept drift. Dynamic Weighted Majority (DWM) maintains an ensemble of base learners, predicts using a weightedmajority vote of these "experts", and dynamically creates and deletes experts in response to changes in performance. We empirically evaluated two experimental systems based on the method using incremental naive Bayes and Incremental Tree Inducer (ITI) as experts.
Mining Surprising Patterns Using Temporal Description Length
, 1998
"... We propose a new notion of surprising temporal patterns in market basket data, and algorithms to find such patterns. This is distinct from finding frequent patterns as addressed in the common mining literature. We argue that once the analyst is already familiar with prevalent patterns in the data, t ..."
Abstract

Cited by 58 (0 self)
 Add to MetaCart
We propose a new notion of surprising temporal patterns in market basket data, and algorithms to find such patterns. This is distinct from finding frequent patterns as addressed in the common mining literature. We argue that once the analyst is already familiar with prevalent patterns in the data, the greatest incremental benefit is likely to be from changes in the relationship between item frequencies over time. A simple measure of surprise is the extent of departure from a model, estimated using standard multivariate time series analysis. Unfortunately, such estimation involves models, smoothing windows and parameters whose optimal choices can vary dramatically from one application to another. In contrast, we propose a precise characterization of surprise based on the number of bits in which a basket sequence can be encoded under a carefully chosen coding scheme. In this scheme it is inexpensive to encode sequences of itemsets that have steady, hence likely to be wellknown, correla...
Classification trees with unbiased multiway splits
 Journal of the American Statistical Association
, 2001
"... Two univariate split methods and one linear combination split method are proposed for the construction of classification trees with multiway splits. Examples are given where the trees are more compact and hence easier to interpret than binary trees. A major strength of the univariate split methods i ..."
Abstract

Cited by 56 (8 self)
 Add to MetaCart
Two univariate split methods and one linear combination split method are proposed for the construction of classification trees with multiway splits. Examples are given where the trees are more compact and hence easier to interpret than binary trees. A major strength of the univariate split methods is that they have negligible bias in variable selection, both when the variables differ in the number of splits they offer and when they differ in number of missing values. This is an advantage because inferences from the tree structures can be adversely affected by selection bias. The new methods are shown to be highly competitive in terms of computational speed and classification accuracy of future observations. Key words and phrases: Decision tree, linear discriminant analysis, missing value, selection bias. 1
Selecting Examples for Partial Memory Learning
 Machine Learning
, 2000
"... . This paper describes a method for selecting training examples for a partial memory learning system. The method selects extreme examples that lie at the boundaries of concept descriptions and uses these examples with new training examples to induce new concept descriptions. Forgetting mechanisms al ..."
Abstract

Cited by 41 (4 self)
 Add to MetaCart
. This paper describes a method for selecting training examples for a partial memory learning system. The method selects extreme examples that lie at the boundaries of concept descriptions and uses these examples with new training examples to induce new concept descriptions. Forgetting mechanisms also may be active to remove examples from partial memory that are irrelevant or outdated for the learning task. Using an implementation of the method, we conducted a lesion study and a direct comparison to examine the effects of partial memory learning on predictive accuracy and on the number of training examples maintained during learning. These experiments involved the STAGGER Concepts, a synthetic problem, and two realworld problems: a blasting cap detection problem and a computer intrusion detection problem. Experimental results suggest that the partial memory learner notably reduced memory requirements at the slight expense of predictive accuracy, and tracked concept drift as well as ot...