Results 1  10
of
40
Bagging, Boosting, and C4.5
 In Proceedings of the Thirteenth National Conference on Artificial Intelligence
, 1996
"... Breiman's bagging and Freund and Schapire's boosting are recent methods for improving the predictive power of classifier learning systems. Both form a set of classifiers that are combined by voting, bagging by generating replicated bootstrap samples of the data, and boosting by adjusting the weight ..."
Abstract

Cited by 271 (1 self)
 Add to MetaCart
Breiman's bagging and Freund and Schapire's boosting are recent methods for improving the predictive power of classifier learning systems. Both form a set of classifiers that are combined by voting, bagging by generating replicated bootstrap samples of the data, and boosting by adjusting the weights of training instances. This paper reports results of applying both techniques to a system that learns decision trees and testing on a representative collection of datasets. While both approaches substantially improve predictive accuracy, boosting shows the greater benefit. On the other hand, boosting also produces severe degradation on some datasets. A small change to the way that boosting combines the votes of learned classifiers reduces this downside and also leads to slightly better results on most of the datasets considered. Introduction Designers of empirical machine learning systems are concerned with such issues as the computational cost of the learning method and the accuracy and ...
Costsensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm
 JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 1995
"... This paper introduces ICET, a new algorithm for costsensitive classification. ICET uses a genetic algorithm to evolve a population of biases for a decision tree induction algorithm. The fitness ..."
Abstract

Cited by 155 (5 self)
 Add to MetaCart
This paper introduces ICET, a new algorithm for costsensitive classification. ICET uses a genetic algorithm to evolve a population of biases for a decision tree induction algorithm. The fitness
Automatic Construction of Decision Trees from Data: A MultiDisciplinary Survey
 Data Mining and Knowledge Discovery
, 1997
"... Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial ne ..."
Abstract

Cited by 146 (1 self)
 Add to MetaCart
Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial neural networks. Researchers in these disciplines, sometimes working on quite different problems, identified similar issues and heuristics for decision tree construction. This paper surveys existing work on decision tree construction, attempting to identify the important issues involved, directions the work has taken and the current state of the art. Keywords: classification, treestructured classifiers, data compaction 1. Introduction Advances in data collection methods, storage and processing technology are providing a unique challenge and opportunity for automated data exploration techniques. Enormous amounts of data are being collected daily from major scientific projects e.g., Human Genome...
Searching for Dependencies in Bayesian Classifiers
, 1996
"... Naive Bayesian classifiers which make independence assumptions perform remarkably well on some data sets but poorly on others. We explore ways to improve the Bayesian classifier by searching for dependencies among attributes. We propose and evaluate two algorithms for detecting dependencies among at ..."
Abstract

Cited by 69 (5 self)
 Add to MetaCart
Naive Bayesian classifiers which make independence assumptions perform remarkably well on some data sets but poorly on others. We explore ways to improve the Bayesian classifier by searching for dependencies among attributes. We propose and evaluate two algorithms for detecting dependencies among attributes and show that the backward sequential elimination and joining algorithm provides the most improvement over the naive Bayesian classifier. The domains on which the most improvement occurs are those domains on which the naive Bayesian classifier is significantly less accurate than a decision tree learner. This suggests that the attributes used in some common databases are not independent conditioned on the class and that the violations of the independence assumption that affect the accuracy of the classifier can be detected from training data. 23.1 Introduction The Bayesian classifier (Duda
Lookahead and Pathology in Decision Tree Induction
 Proceedings of the 14th International Joint Conference on Artificial Intelligence
, 1995
"... The standard approach to decision tree induction is a topdown, greedy algorithm that makes locally optimal, irrevocable decisions at each node of a tree. In this paper, we study an alternative approach, in which the algorithms use limited lookahead to decide what test to use at a node. We systemati ..."
Abstract

Cited by 52 (2 self)
 Add to MetaCart
The standard approach to decision tree induction is a topdown, greedy algorithm that makes locally optimal, irrevocable decisions at each node of a tree. In this paper, we study an alternative approach, in which the algorithms use limited lookahead to decide what test to use at a node. We systematically compare, using a very large number of decision trees, the quality of decision trees induced by the greedy approach to that of trees induced using lookahead. The main results of our experiments are: (i) the greedy approach produces trees that are just as accurate as trees produced with the much more expensive lookahead step; and (ii) decision tree induction exhibits pathology, in the sense that lookahead can produce trees that are both larger and less accurate than trees produced without it. 1. Introduction The standard algorithm for constructing decision trees from a set of examples is greedy induction  a tree is induced topdown with locally optimal choices made at each node, with...
Simplifying Decision Trees: A Survey
, 1996
"... Induced decision trees are an extensivelyresearched solution to classification tasks. For many practical tasks, the trees produced by treegeneration algorithms are not comprehensible to users due to their size and complexity. Although many tree induction algorithms have been shown to produce simpl ..."
Abstract

Cited by 38 (5 self)
 Add to MetaCart
Induced decision trees are an extensivelyresearched solution to classification tasks. For many practical tasks, the trees produced by treegeneration algorithms are not comprehensible to users due to their size and complexity. Although many tree induction algorithms have been shown to produce simpler, more comprehensible trees (or data structures derived from trees) with good classification accuracy, tree simplification has usually been of secondary concern relative to accuracy and no attempt has been made to survey the literature from the perspective of simplification. We present a framework that organizes the approaches to tree simplification and summarize and critique the approaches within this framework. The purpose of this survey is to provide researchers and practitioners with a concise overview of treesimplification approaches and insight into their relative capabilities. In our final discussion, we briefly describe some empirical findings and discuss the application of tree i...
Feature Selection via Discretization
 IEEE Trans. Knowledge and Data Eng
, 1997
"... Abstractâ€”Discretization can turn numeric attributes into discrete ones. Feature selection can eliminate some irrelevant and/or redundant attributes. Chi2 is a simple and general algorithm that uses the c 2 statistic to discretize numeric attributes repeatedly until some inconsistencies are found in ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
Abstractâ€”Discretization can turn numeric attributes into discrete ones. Feature selection can eliminate some irrelevant and/or redundant attributes. Chi2 is a simple and general algorithm that uses the c 2 statistic to discretize numeric attributes repeatedly until some inconsistencies are found in the data. It achieves feature selection via discretization. It can handle mixed attributes, work with multiclass data, and remove irrelevant and redundant attributes. Index Termsâ€”Discretization, feature selection, pattern classification. 1
Rerepresenting and Restructuring Domain Theories: A Constructive Induction Approach
 Journal of Artificial Intelligence Research
, 1995
"... Theory revision integrates inductive learning and background knowledge by combining training examples with a coarse domain theory to produce a more accurate theory. There are two challenges that theory revision and other theoryguided systems face. First, a representation language appropriate for th ..."
Abstract

Cited by 27 (0 self)
 Add to MetaCart
Theory revision integrates inductive learning and background knowledge by combining training examples with a coarse domain theory to produce a more accurate theory. There are two challenges that theory revision and other theoryguided systems face. First, a representation language appropriate for the initial theory may be inappropriate for an improved theory. While the original representation may concisely express the initial theory, a more accurate theory forced to use that same representation may be bulky, cumbersome, and difficult to reach. Second, a theory structure suitable for a coarse domain theory may be insufficient for a finetuned theory. Systems that produce only small, local changes to a theory have limited value for accomplishing complex structural alterations that may be required. Consequently, advanced theoryguided learning systems require flexible representation and flexible structure. An analysis of various theory revision systems and theoryguided learning systems ...
Constructing XofN Attributes for Decision Tree Learning
 Machine Learning
, 1998
"... . While many constructive induction algorithms focus on generating new binary attributes, this paper explores novel methods of constructing nominal and numeric attributes. We propose a new constructive operator, XofN. An XofN representation is a set containing one or more attributevalue pairs. ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
. While many constructive induction algorithms focus on generating new binary attributes, this paper explores novel methods of constructing nominal and numeric attributes. We propose a new constructive operator, XofN. An XofN representation is a set containing one or more attributevalue pairs. For a given instance, the value of an XofN representation corresponds to the number of its attributevalue pairs that are true of the instance. A single XofN representation can directly and simply represent any concept that can be represented by a single conjunctive, a single disjunctive, or a single MofN representation commonly used for constructive induction, and the reverse is not true. In this paper, we describe a constructive decision tree learning algorithm, called XofN. When building decision trees, this algorithm creates one XofN representation, either as a nominal attribute or as a numeric attribute, at each decision node. The construction of XofN representations is carrie...
Global Data Analysis and the Fragmentation Problem in Decision Tree Induction
 In 9th European Conference on Machine Learning
, 1997
"... We investigate an inherent limitation of topdown decision tree induction in which the continuous partitioning of the instance space progressively lessens the statistical support of every partial (i.e. disjunctive) hypothesis, known as the fragmentation problem. We show, both theoretically and e ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
We investigate an inherent limitation of topdown decision tree induction in which the continuous partitioning of the instance space progressively lessens the statistical support of every partial (i.e. disjunctive) hypothesis, known as the fragmentation problem. We show, both theoretically and empirically, how the fragmentation problem adversely affects predictive accuracy as variation r (a measure of concept difficulty) increases. Applying featureconstruction techniques at every tree node, which we implement on a decision tree inducer DALI , is proved to only partially solve the fragmentation problem. Our study illustrates how a more robust solution must also assess the value of each partial hypothesis by recurring to all available training data, an approach we name global data analysis, which decision tree induction alone is unable to accomplish. The value of global data analysis is evaluated by comparing modified versions of C4.5rules with C4.5trees and DALI , on both artificial and realworld domains. Empirical results suggest the importance of combining both feature construction and global data analysis to solve the fragmentation problem.