Results 1  10
of
95
A System for Induction of Oblique Decision Trees
 Journal of Artificial Intelligence Research
, 1994
"... This article describes a new system for induction of oblique decision trees. This system, OC1, combines deterministic hillclimbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree. Oblique decision tree methods are tuned espe ..."
Abstract

Cited by 251 (13 self)
 Add to MetaCart
This article describes a new system for induction of oblique decision trees. This system, OC1, combines deterministic hillclimbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree. Oblique decision tree methods are tuned especially for domains in which the attributes are numeric, although they can be adapted to symbolic or mixed symbolic/numeric attributes. We present extensive empirical studies, using both real and artificial data, that analyze OC1's ability to construct oblique trees that are smaller and more accurate than their axisparallel counterparts. We also examine the benefits of randomization for the construction of oblique decision trees. 1. Introduction Current data collection technology provides a unique challenge and opportunity for automated machine learning techniques. The advent of major scientific projects such as the Human Genome Project, the Hubble Space Telescope, and the human brain mappi...
Operations for Learning with Graphical Models
 Journal of Artificial Intelligence Research
, 1994
"... This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Wellknown examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models ..."
Abstract

Cited by 249 (12 self)
 Add to MetaCart
This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Wellknown examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models are extended to model data analysis and empirical learning using the notation of plates. Graphical operations for simplifying and manipulating a problem are provided including decomposition, differentiation, and the manipulation of probability models from the exponential family. Two standard algorithm schemas for learning are reviewed in a graphical framework: Gibbs sampling and the expectation maximization algorithm. Using these operations and schemas, some popular algorithms can be synthesized from their graphical specification. This includes versions of linear regression, techniques for feedforward networks, and learning Gaussian and discrete Bayesian networks from data. The paper conclu...
Feature Selection for Classification
 Intelligent Data Analysis
, 1997
"... Feature selection has been the focus of interest for quite some time and much work has been done. With the creation of huge databases and the consequent requirements for good machine learning techniques, new problems arise and novel approaches to feature selection are in demand. This survey is a com ..."
Abstract

Cited by 171 (9 self)
 Add to MetaCart
Feature selection has been the focus of interest for quite some time and much work has been done. With the creation of huge databases and the consequent requirements for good machine learning techniques, new problems arise and novel approaches to feature selection are in demand. This survey is a comprehensive overview of many existing methods from the 1970's to the present. It identifies four steps of a typical feature selection method, and categorizes the different existing methods in terms of generation procedures and evaluation functions, and reveals hitherto unattempted combinations of generation procedures and evaluation functions. Representative methods are chosen from each category for detailed explanation and discussion via example. Benchmark datasets with different characteristics are used for comparative study. The strengths and weaknesses of different methods are explained. Guidelines for applying feature selection methods are given based on data types and domain characteris...
Automatic Construction of Decision Trees from Data: A MultiDisciplinary Survey
 Data Mining and Knowledge Discovery
, 1997
"... Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial ne ..."
Abstract

Cited by 146 (1 self)
 Add to MetaCart
Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial neural networks. Researchers in these disciplines, sometimes working on quite different problems, identified similar issues and heuristics for decision tree construction. This paper surveys existing work on decision tree construction, attempting to identify the important issues involved, directions the work has taken and the current state of the art. Keywords: classification, treestructured classifiers, data compaction 1. Introduction Advances in data collection methods, storage and processing technology are providing a unique challenge and opportunity for automated data exploration techniques. Enormous amounts of data are being collected daily from major scientific projects e.g., Human Genome...
Separateandconquer rule learning
 Artificial Intelligence Review
, 1999
"... This paper is a survey of inductive rule learning algorithms that use a separateandconquer strategy. This strategy can be traced back to the AQ learning system and still enjoys popularity as can be seen from its frequent use in inductive logic programming systems. We will put this wide variety of ..."
Abstract

Cited by 135 (29 self)
 Add to MetaCart
This paper is a survey of inductive rule learning algorithms that use a separateandconquer strategy. This strategy can be traced back to the AQ learning system and still enjoys popularity as can be seen from its frequent use in inductive logic programming systems. We will put this wide variety of algorithms into a single framework and analyze them along three different dimensions, namely their search, language and overfitting avoidance biases.
The Lack of A Priori Distinctions Between Learning Algorithms
, 1996
"... This is the first of two papers that use offtraining set (OTS) error to investigate the assumption free relationship between learning algorithms. This first paper discusses the senses in which there are no a priori distinctions between learning algorithms. (The second paper discusses the senses in ..."
Abstract

Cited by 123 (5 self)
 Add to MetaCart
This is the first of two papers that use offtraining set (OTS) error to investigate the assumption free relationship between learning algorithms. This first paper discusses the senses in which there are no a priori distinctions between learning algorithms. (The second paper discusses the senses in which there are such distinctions.) In this first paper it is shown, loosely speaking, that for any two algorithms A and B, there are "as many" targets (or priors over targets) for which A has lower expected OTS error than B as viceversa, for loss functions like zeroone loss. In particular, this is true if A is crossvalidation and B is "anticrossvalidation" (choose the learning algorithm with largest crossvalidation error). This paper ends with a discussion of the implications of these results for computational learning theory. It is shown that one can not say: if empirical misclassification rate is low; the VapnikChervonenkis dimension of your generalizer is small; and the trainin...
Incremental Reduced Error Pruning
, 1994
"... This paper outlines some problems that may occur with Reduced Error Pruning in Inductive Logic Programming , most notably efficiency. Thereafter a new method, Incremental Reduced Error Pruning , is proposed that attempts to address all of these problems. Experiments show that in many noisy domains t ..."
Abstract

Cited by 120 (23 self)
 Add to MetaCart
This paper outlines some problems that may occur with Reduced Error Pruning in Inductive Logic Programming , most notably efficiency. Thereafter a new method, Incremental Reduced Error Pruning , is proposed that attempts to address all of these problems. Experiments show that in many noisy domains this method is much more efficient than alternative algorithms, along with a slight gain in accuracy. However, the experiments show as well that the use of this algorithm cannot be recommended for domains with a very specific concept description. OEFAITR9409 1 Introduction Being able to deal with noisy data is a must for algorithms that are meant to learn concepts in realworld domains. Significant effort has gone into investigating the effect of noisy data on decision tree learning algorithms (see e.g. [Quinlan, 1993, Breiman et al., 1984]). Not surprisingly, noise handling methods have also entered the emerging field of Inductive Logic Programming (ILP) [Muggleton, 1992]. Linus [Lavr...
Oversearching and Layered Search in Empirical Learning
 In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence
, 1995
"... When learning classifiers, more extensive search for rules is shown to lead to lower predictive accuracy on many of the realworld domains investigated. This counterintuitive result is particularly relevant to recent systematic search methods that use riskfree pruning to achieve the same outcome a ..."
Abstract

Cited by 86 (0 self)
 Add to MetaCart
When learning classifiers, more extensive search for rules is shown to lead to lower predictive accuracy on many of the realworld domains investigated. This counterintuitive result is particularly relevant to recent systematic search methods that use riskfree pruning to achieve the same outcome as exhaustive search. We propose an iterated search method that commences with greedy search, extending its scope at each iteration until a stopping criterion is satisfied. This layered search is often found to produce theories that are more accurate than those obtained with either greedy search or moderately extensive beam search. 1 Introduction Mitchell [1982] observes that the generalization implicit in learning from examples can be viewed as a search over the space of possible theories. From this perspective, most machine learning methods carry out a series of local searches in the vicinity of the current theory, selecting at each step the most promising improvement. Covering algorithms ...
A survey of evolutionary algorithms for data mining and knowledge discovery
 In: A. Ghosh, and S. Tsutsui (Eds.) Advances in Evolutionary Computation
, 2002
"... Abstract: This chapter discusses the use of evolutionary algorithms, particularly genetic algorithms and genetic programming, in data mining and knowledge discovery. We focus on the data mining task of classification. In addition, we discuss some preprocessing and postprocessing steps of the knowled ..."
Abstract

Cited by 84 (3 self)
 Add to MetaCart
Abstract: This chapter discusses the use of evolutionary algorithms, particularly genetic algorithms and genetic programming, in data mining and knowledge discovery. We focus on the data mining task of classification. In addition, we discuss some preprocessing and postprocessing steps of the knowledge discovery process, focusing on attribute selection and pruning of an ensemble of classifiers. We show how the requirements of data mining and knowledge discovery influence the design of evolutionary algorithms. In particular, we discuss how individual representation, genetic operators and fitness functions have to be adapted for extracting highlevel knowledge from data. 1.
Data Mining with an Ant Colony Optimization Algorithm
 IEEE Transactions on Evolutionary Computation
, 2002
"... Abstract – This work proposes an algorithm for data mining called AntMiner (Ant Colonybased Data Miner). The goal of AntMiner is to extract classification rules from data. The algorithm is inspired by both research on the behavior of real ant colonies and some data mining concepts and principles. ..."
Abstract

Cited by 83 (13 self)
 Add to MetaCart
Abstract – This work proposes an algorithm for data mining called AntMiner (Ant Colonybased Data Miner). The goal of AntMiner is to extract classification rules from data. The algorithm is inspired by both research on the behavior of real ant colonies and some data mining concepts and principles. We compare the performance of AntMiner with CN2, a wellknown data mining algorithm for classification, in six public domain data sets. The results provide evidence that: (a) AntMiner is competitive with CN2 with respect to predictive accuracy; and (b) The rule lists discovered by AntMiner are considerably simpler (smaller) than those discovered by CN2. Index Terms – Ant Colony Optimization, data mining, knowledge discovery, classification. I.