Results 1  10
of
62
Rule Induction with CN2: Some Recent Improvements
, 1991
"... The CN2 algorithm induces an ordered list of classification rules from examples using entropy as its search heuristic. In this short paper, we describe two improvements to this algorithm. Firstly, we present the use of the Laplacian error estimate as an alternative evaluation function and secondly, ..."
Abstract

Cited by 327 (2 self)
 Add to MetaCart
The CN2 algorithm induces an ordered list of classification rules from examples using entropy as its search heuristic. In this short paper, we describe two improvements to this algorithm. Firstly, we present the use of the Laplacian error estimate as an alternative evaluation function and secondly, we show how unordered as well as ordered rules can be generated. We experimentally demonstrate significantly improved performances resulting from these changes, thus enhancing the usefulness of CN2 as an inductive tool. Comparisons with Quinlan's C4.5 are also made. Keywords: learning, rule induction, CN2, Laplace, noise 1 Introduction Rule induction from examples has established itself as a basic component of many machine learning systems, and has been the first ML technology to deliver commercially successful applications (eg. the systems GASOIL [Slocombe et al., 1986], BMT [HayesMichie, 1990], and in process control [Leech, 1986]). The continuing development of inductive techniques is t...
Overfitting Avoidance as Bias
, 1992
"... Strategies for increasing predictive accuracy through selective pruning have been widely adopted by researchers in decision tree induction. It is easy to get the impression from research reports that there are statistical reasons for believing that these overfitting avoidance strategies do increase ..."
Abstract

Cited by 122 (2 self)
 Add to MetaCart
Strategies for increasing predictive accuracy through selective pruning have been widely adopted by researchers in decision tree induction. It is easy to get the impression from research reports that there are statistical reasons for believing that these overfitting avoidance strategies do increase accuracy and that, as a research community, we are making progress toward developing powerful, general methods for guarding against overfitting in inducing decision trees. In fact, any overfitting avoidance strategy amounts to a form of bias and, as such, may degrade performance instead of improving it. If pruning methods have often proven successful in empirical tests, this is due, not to the methods, but to the choice of test problems. As examples in this article illustrate, overfitting avoidance strategies are not better or worse, but only more or less appropriate to specific application domains. We are notand cannot bemaking progress toward methods both powerful and general. The ...
The Effects of Training Set Size on Decision Tree Complexity
, 1997
"... This paper presents experiments with 19 datasets and 5 decision tree pruning algorithms that show that increasing training set size often results in a linear increase in tree size, even when that additional complexity results in no significant increase in classification accuracy. ..."
Abstract

Cited by 67 (10 self)
 Add to MetaCart
This paper presents experiments with 19 datasets and 5 decision tree pruning algorithms that show that increasing training set size often results in a linear increase in tree size, even when that additional complexity results in no significant increase in classification accuracy.
Overcoming the myopia of inductive learning algorithms with RELIEFF
 Applied Intelligence
, 1997
"... . Current inductive machine learning algorithms typically use greedy search with limited lookahead. This prevents them to detect significant conditional dependencies between the attributes that describe training objects. Instead of myopic impurity functions and lookahead, we propose to use RELIEFF, ..."
Abstract

Cited by 38 (12 self)
 Add to MetaCart
. Current inductive machine learning algorithms typically use greedy search with limited lookahead. This prevents them to detect significant conditional dependencies between the attributes that describe training objects. Instead of myopic impurity functions and lookahead, we propose to use RELIEFF, an extension of RELIEF developed by Kira and Rendell [10], [11], for heuristic guidance of inductive learning algorithms. We have reimplemented Assistant, a system for top down induction of decision trees, using RELIEFF as an estimator of attributes at each selection step. The algorithm is tested on several artificial and several real world problems and the results are compared with some other well known machine learning algorithms. Excellent results on artificial data sets and two real world problems show the advantage of the presented approach to inductive learning. Keywords: learning from examples, estimating attributes, impurity function, RELIEFF, empirical evaluation 1. Introduction ...
Improving the AUC of Probabilistic Estimation Trees
 In Proc. of the 14th European Conf. on Machine Learning
, 2003
"... In this work we investigate several issues in order to improve the performance of probabilistic estimation trees (PETs). First, we derive a new probability smoothing that takes into account the class distributions of all the nodes from the root to each leaf. Secondly, we introduce or adapt some n ..."
Abstract

Cited by 32 (7 self)
 Add to MetaCart
In this work we investigate several issues in order to improve the performance of probabilistic estimation trees (PETs). First, we derive a new probability smoothing that takes into account the class distributions of all the nodes from the root to each leaf. Secondly, we introduce or adapt some new splitting criteria aimed at improving probability estimates rather than improving classification accuracy, and compare them with other accuracyaimed splitting criteria.
Decision tree pruning as a search in the state space
 Proceedings of the 6th European Conference on Machine Learning (ECML93
, 1993
"... Abstract. This paper presents a study of one particular problem of decision tree induction, namely (post)pnming, with the aim of finding acommon framework for the plethora of pruning methods appeared in literature. Given a tree Tm ~ to prune, a state space is defined as the set of all subtrees ofT ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
Abstract. This paper presents a study of one particular problem of decision tree induction, namely (post)pnming, with the aim of finding acommon framework for the plethora of pruning methods appeared in literature. Given a tree Tm ~ to prune, a state space is defined as the set of all subtrees ofT to which only one operator, called anydepth branch pruning operator, can be applied in several ways in order to move from one state to another. By introducing an evaluation functionfdefined on the set of subtrees, the problem of tree pruning can be cast as an optimization problem, and it is also possible to classify each pestpruning method according to both its searc h strategy and the kind of information exploited byf. Indeed, while some methods use only the training set in order to evaluate the accuracy of a decision tree, other methods exploit an additional pruning set that allows them to get less biased estimates ofthepredictive accuracy of aprunedtree. The introduction of the state space shows that very simple search strategies are used by the postpruning methods considered. Finally, some empirical results allow theoretical observations on strengths and weaknesses of priming methods to be better understood. 1
Constructing Bayesian finite mixture models by the EM algorithm
, 1997
"... In this paper we explore the use of finite mixture models for building decision support systems capable of sound probabilistic inference. Finite mixture models have many appealing properties: they are computationally efficient in the prediction (reasoning) phase, they are universal in the sense that ..."
Abstract

Cited by 23 (13 self)
 Add to MetaCart
In this paper we explore the use of finite mixture models for building decision support systems capable of sound probabilistic inference. Finite mixture models have many appealing properties: they are computationally efficient in the prediction (reasoning) phase, they are universal in the sense that they can approximate any problem domain distribution, and they can handle multimodality well. We present a formulation of the model construction problem in the Bayesian framework for finite mixture models, and describe how Bayesian inference is performed given such a model. The model construction problem can be seen as missing data estimation and we describe a realization of the ExpectationMaximization (EM) algorithm for finding good models. To prove the feasibility of our approach, we report crossvalidated empirical results on several publicly available classification problem datasets, and compare our results to corresponding results obtained by alternative techniques, such as neural netw...
Sparse Data and the Effect of Overfitting Avoidance in Decision Tree Induction
 In Proceedings of the Tenth National Conference on Artificial Intelligence, AAAI92
, 1992
"... Overfitting avoidance in induction has often been treated as if it statistically increases expected predictive accuracy. In fact, there is no statistical basis for believing it will have this effect. Overfitting avoidance is simply a form of bias and, as such, its effect on expected accuracy depends ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
Overfitting avoidance in induction has often been treated as if it statistically increases expected predictive accuracy. In fact, there is no statistical basis for believing it will have this effect. Overfitting avoidance is simply a form of bias and, as such, its effect on expected accuracy depends, not on statistics, but on the degree to which this bias is appropriate to a problemgenerating domain. This paper identifies one important factor that affects the degree to which the bias of overfitting avoidance is appropriatethe abundance of training data relative to the complexity of the relationship to be inducedand shows empirically how it determines whether such methods as pessimistic and crossvalidated costcomplexity pruning will increase or decrease predictive accuracy in decision tree induction. The effect of sparse data is illustrated first in an artificial domain and then in more realistic examples drawn from the UCI machine learning database repository. Introduction I...
Why Discretization Works for Naive Bayesian Classifiers
, 2000
"... This paper explains why wellknown discretization methods, such as entropybased and tenbin, work well for naive Bayesian classifiers with continuous variables, regardless of their complexities. These methods usually assume that discretized variables have Dirichlet priors. Since perfect aggre ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
This paper explains why wellknown discretization methods, such as entropybased and tenbin, work well for naive Bayesian classifiers with continuous variables, regardless of their complexities. These methods usually assume that discretized variables have Dirichlet priors. Since perfect aggregation holds for Dirichlets, we can show that, generally, a wide variety of discretization methods can perform well with insignificant difference. We identify situations where discretization may cause performance degradation and show that they are unlikely to happen for wellknown methods. We empirically test our explanation with synthesized and real data sets and obtain confirming results. Our analysis leads to a lazy discretization method that can simplify the training for naive Bayes. This new method can perform as well as wellknown methods in our experiment. 1. Introduction Learning a naive Bayesian classifier (a.k.a. naive Bayes) (Langley et al., 1992) from data is an ...
Induction of decision trees using RELIEFF
, 1995
"... In the context of machine learning from examples this paper deals with the problem of estimating the quality of attributes with and without dependencies between them. Greedy search prevents current inductive machine learning algorithms to detect significant dependencies between the attributes. Recen ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
In the context of machine learning from examples this paper deals with the problem of estimating the quality of attributes with and without dependencies between them. Greedy search prevents current inductive machine learning algorithms to detect significant dependencies between the attributes. Recently, Kira and Rendell developed the RELIEF algorithm for estimating the quality of attributes that is able to detect dependencies between attributes. We show strong relation between RELIEF's estimates and impurity functions, that are usually used for heuristic guidance of inductive learning algorithms. We propose to use RELIEFF, an extended version of RELIEF, instead of myopic impurity functions. We have reimplemented Assistant, a system for top down induction of decision trees, using RELIEFF as an estimator of attributes at each selection step. The algorithm is tested on several artificial and several real world problems. Results show the advantage of the presented approach to inductive lea...