Results 1  10
of
134
A System for Induction of Oblique Decision Trees
 Journal of Artificial Intelligence Research
, 1994
"... This article describes a new system for induction of oblique decision trees. This system, OC1, combines deterministic hillclimbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree. Oblique decision tree methods are tuned espe ..."
Abstract

Cited by 295 (14 self)
 Add to MetaCart
(Show Context)
This article describes a new system for induction of oblique decision trees. This system, OC1, combines deterministic hillclimbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree. Oblique decision tree methods are tuned especially for domains in which the attributes are numeric, although they can be adapted to symbolic or mixed symbolic/numeric attributes. We present extensive empirical studies, using both real and artificial data, that analyze OC1's ability to construct oblique trees that are smaller and more accurate than their axisparallel counterparts. We also examine the benefits of randomization for the construction of oblique decision trees. 1. Introduction Current data collection technology provides a unique challenge and opportunity for automated machine learning techniques. The advent of major scientific projects such as the Human Genome Project, the Hubble Space Telescope, and the human brain mappi...
Feature Selection for Classification
 Intelligent Data Analysis
, 1997
"... Feature selection has been the focus of interest for quite some time and much work has been done. With the creation of huge databases and the consequent requirements for good machine learning techniques, new problems arise and novel approaches to feature selection are in demand. This survey is a com ..."
Abstract

Cited by 285 (9 self)
 Add to MetaCart
(Show Context)
Feature selection has been the focus of interest for quite some time and much work has been done. With the creation of huge databases and the consequent requirements for good machine learning techniques, new problems arise and novel approaches to feature selection are in demand. This survey is a comprehensive overview of many existing methods from the 1970's to the present. It identifies four steps of a typical feature selection method, and categorizes the different existing methods in terms of generation procedures and evaluation functions, and reveals hitherto unattempted combinations of generation procedures and evaluation functions. Representative methods are chosen from each category for detailed explanation and discussion via example. Benchmark datasets with different characteristics are used for comparative study. The strengths and weaknesses of different methods are explained. Guidelines for applying feature selection methods are given based on data types and domain characteris...
Operations for Learning with Graphical Models
 Journal of Artificial Intelligence Research
, 1994
"... This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Wellknown examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models ..."
Abstract

Cited by 278 (13 self)
 Add to MetaCart
This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Wellknown examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models are extended to model data analysis and empirical learning using the notation of plates. Graphical operations for simplifying and manipulating a problem are provided including decomposition, differentiation, and the manipulation of probability models from the exponential family. Two standard algorithm schemas for learning are reviewed in a graphical framework: Gibbs sampling and the expectation maximization algorithm. Using these operations and schemas, some popular algorithms can be synthesized from their graphical specification. This includes versions of linear regression, techniques for feedforward networks, and learning Gaussian and discrete Bayesian networks from data. The paper conclu...
Automatic Construction of Decision Trees from Data: A MultiDisciplinary Survey
 Data Mining and Knowledge Discovery
, 1997
"... Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial ne ..."
Abstract

Cited by 223 (1 self)
 Add to MetaCart
(Show Context)
Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial neural networks. Researchers in these disciplines, sometimes working on quite different problems, identified similar issues and heuristics for decision tree construction. This paper surveys existing work on decision tree construction, attempting to identify the important issues involved, directions the work has taken and the current state of the art. Keywords: classification, treestructured classifiers, data compaction 1. Introduction Advances in data collection methods, storage and processing technology are providing a unique challenge and opportunity for automated data exploration techniques. Enormous amounts of data are being collected daily from major scientific projects e.g., Human Genome...
The Lack of A Priori Distinctions Between Learning Algorithms
, 1996
"... This is the first of two papers that use offtraining set (OTS) error to investigate the assumption free relationship between learning algorithms. This first paper discusses the senses in which there are no a priori distinctions between learning algorithms. (The second paper discusses the senses in ..."
Abstract

Cited by 165 (5 self)
 Add to MetaCart
This is the first of two papers that use offtraining set (OTS) error to investigate the assumption free relationship between learning algorithms. This first paper discusses the senses in which there are no a priori distinctions between learning algorithms. (The second paper discusses the senses in which there are such distinctions.) In this first paper it is shown, loosely speaking, that for any two algorithms A and B, there are "as many" targets (or priors over targets) for which A has lower expected OTS error than B as viceversa, for loss functions like zeroone loss. In particular, this is true if A is crossvalidation and B is "anticrossvalidation" (choose the learning algorithm with largest crossvalidation error). This paper ends with a discussion of the implications of these results for computational learning theory. It is shown that one can not say: if empirical misclassification rate is low; the VapnikChervonenkis dimension of your generalizer is small; and the trainin...
Separateandconquer rule learning
 Artificial Intelligence Review
, 1999
"... This paper is a survey of inductive rule learning algorithms that use a separateandconquer strategy. This strategy can be traced back to the AQ learning system and still enjoys popularity as can be seen from its frequent use in inductive logic programming systems. We will put this wide variety of ..."
Abstract

Cited by 165 (29 self)
 Add to MetaCart
(Show Context)
This paper is a survey of inductive rule learning algorithms that use a separateandconquer strategy. This strategy can be traced back to the AQ learning system and still enjoys popularity as can be seen from its frequent use in inductive logic programming systems. We will put this wide variety of algorithms into a single framework and analyze them along three different dimensions, namely their search, language and overfitting avoidance biases.
Incremental Reduced Error Pruning
, 1994
"... This paper outlines some problems that may occur with Reduced Error Pruning in Inductive Logic Programming , most notably efficiency. Thereafter a new method, Incremental Reduced Error Pruning , is proposed that attempts to address all of these problems. Experiments show that in many noisy domains t ..."
Abstract

Cited by 152 (23 self)
 Add to MetaCart
This paper outlines some problems that may occur with Reduced Error Pruning in Inductive Logic Programming , most notably efficiency. Thereafter a new method, Incremental Reduced Error Pruning , is proposed that attempts to address all of these problems. Experiments show that in many noisy domains this method is much more efficient than alternative algorithms, along with a slight gain in accuracy. However, the experiments show as well that the use of this algorithm cannot be recommended for domains with a very specific concept description. OEFAITR9409 1 Introduction Being able to deal with noisy data is a must for algorithms that are meant to learn concepts in realworld domains. Significant effort has gone into investigating the effect of noisy data on decision tree learning algorithms (see e.g. [Quinlan, 1993, Breiman et al., 1984]). Not surprisingly, noise handling methods have also entered the emerging field of Inductive Logic Programming (ILP) [Muggleton, 1992]. Linus [Lavr...
Data Mining with an Ant Colony Optimization Algorithm
 IEEE Transactions on Evolutionary Computation
, 2002
"... Abstract – This work proposes an algorithm for data mining called AntMiner (Ant Colonybased Data Miner). The goal of AntMiner is to extract classification rules from data. The algorithm is inspired by both research on the behavior of real ant colonies and some data mining concepts and principles. ..."
Abstract

Cited by 121 (13 self)
 Add to MetaCart
(Show Context)
Abstract – This work proposes an algorithm for data mining called AntMiner (Ant Colonybased Data Miner). The goal of AntMiner is to extract classification rules from data. The algorithm is inspired by both research on the behavior of real ant colonies and some data mining concepts and principles. We compare the performance of AntMiner with CN2, a wellknown data mining algorithm for classification, in six public domain data sets. The results provide evidence that: (a) AntMiner is competitive with CN2 with respect to predictive accuracy; and (b) The rule lists discovered by AntMiner are considerably simpler (smaller) than those discovered by CN2. Index Terms – Ant Colony Optimization, data mining, knowledge discovery, classification. I.
A survey of evolutionary algorithms for data mining and knowledge discovery
 In: A. Ghosh, and S. Tsutsui (Eds.) Advances in Evolutionary Computation
, 2002
"... Abstract: This chapter discusses the use of evolutionary algorithms, particularly genetic algorithms and genetic programming, in data mining and knowledge discovery. We focus on the data mining task of classification. In addition, we discuss some preprocessing and postprocessing steps of the knowled ..."
Abstract

Cited by 116 (3 self)
 Add to MetaCart
(Show Context)
Abstract: This chapter discusses the use of evolutionary algorithms, particularly genetic algorithms and genetic programming, in data mining and knowledge discovery. We focus on the data mining task of classification. In addition, we discuss some preprocessing and postprocessing steps of the knowledge discovery process, focusing on attribute selection and pruning of an ensemble of classifiers. We show how the requirements of data mining and knowledge discovery influence the design of evolutionary algorithms. In particular, we discuss how individual representation, genetic operators and fitness functions have to be adapted for extracting highlevel knowledge from data. 1.
The Class Imbalance Problem: Significance and Strategies
 In Proceedings of the 2000 International Conference on Artificial Intelligence (ICAI
, 2000
"... Although the majority of conceptlearning systems previously designed usually assume that their training sets are wellbalanced, this assumption is not necessarily correct. Indeed, there exist many domains for which one class is represented by a large number of examples while the other is represented ..."
Abstract

Cited by 112 (5 self)
 Add to MetaCart
(Show Context)
Although the majority of conceptlearning systems previously designed usually assume that their training sets are wellbalanced, this assumption is not necessarily correct. Indeed, there exist many domains for which one class is represented by a large number of examples while the other is represented by only a few. The purpose of this paper is 1) to demonstrate experimentally that, at least in the case of connectionist systems, class imbalances hinder the performance of standard classifiers and 2) to compare the performance of several approaches previously proposed to deal with the problem. 1 Introduction As the field of machine learning makes a rapid transition from the status of "academic discipline " to that of "applied science", a myriad of new issues, not previously considered by the machine learning community, is now coming into light. One such issue is the class imbalance problem. The class imbalance problem corresponds to domains for which one class is represented by a large n...