Results 1 - 10
of
22
Fast Effective Rule Induction
, 1995
"... Many existing rule learning systems are computationally expensive on large noisy datasets. In this paper we evaluate the recently-proposed rule learning algorithm IREP on a large and diverse collection of benchmark problems. We show that while IREP is extremely efficient, it frequently gives error r ..."
Abstract
-
Cited by 800 (19 self)
- Add to MetaCart
Many existing rule learning systems are computationally expensive on large noisy datasets. In this paper we evaluate the recently-proposed rule learning algorithm IREP on a large and diverse collection of benchmark problems. We show that while IREP is extremely efficient, it frequently gives error rates higher than those of C4.5 and C4.5rules. We then propose a number of modifications resulting in an algorithm RIPPERk that is very competitive with C4.5rules with respect to error rates, but much more efficient on large samples. RIPPERk obtains error rates lower than or equivalent to C4.5rules on 22 of 37 benchmark problems, scales nearly linearly with the number of training examples, and can efficiently process noisy datasets containing hundreds of thousands of examples.
Separate-and-conquer rule learning
- Artificial Intelligence Review
, 1999
"... This paper is a survey of inductive rule learning algorithms that use a separate-and-conquer strategy. This strategy can be traced back to the AQ learning system and still enjoys popularity as can be seen from its frequent use in inductive logic programming systems. We will put this wide variety of ..."
Abstract
-
Cited by 118 (29 self)
- Add to MetaCart
This paper is a survey of inductive rule learning algorithms that use a separate-and-conquer strategy. This strategy can be traced back to the AQ learning system and still enjoys popularity as can be seen from its frequent use in inductive logic programming systems. We will put this wide variety of algorithms into a single framework and analyze them along three different dimensions, namely their search, language and overfitting avoidance biases.
Efficient Pruning Methods For Separate-And-Conquer Rule Learning Systems
- IN PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE
, 1993
"... Recent years have seen increased interest in systems that learn sets of rules. The goal of this paper is to study the degree to which "separate and conquer" rule learning induction methods scale up to large, real-world learning problems. In particular ..."
Abstract
-
Cited by 42 (2 self)
- Add to MetaCart
Recent years have seen increased interest in systems that learn sets of rules. The goal of this paper is to study the degree to which "separate and conquer" rule learning induction methods scale up to large, real-world learning problems. In particular
Pruning Algorithms for Rule Learning
, 1997
"... Pre-pruning and Post-pruning are two standard techniques for handling noise in decision tree learning. Pre-pruning deals with noise during learning, while post-pruning addresses this problem after an overfitting theory has been learned. We first review several adaptations of pre- and post-pruning te ..."
Abstract
-
Cited by 40 (14 self)
- Add to MetaCart
Pre-pruning and Post-pruning are two standard techniques for handling noise in decision tree learning. Pre-pruning deals with noise during learning, while post-pruning addresses this problem after an overfitting theory has been learned. We first review several adaptations of pre- and post-pruning techniques for separate-and-conquer rule learning algorithms and discuss some fundamental problems. The primary goal of this paper is to show how to solve these problems with two new algorithms that combine and integrate pre- and post-pruning.
Inductive Policy: The Pragmatics of Bias Selection
- MACHINE LEARNING
, 1995
"... This paper extends the currently accepted model of inductive bias by identifying six categories of bias and separates inductive bias from the policy for its selection (the inductive policy). We analyze existing "blas selection " systems, examining the similarities and differences in their ..."
Abstract
-
Cited by 37 (9 self)
- Add to MetaCart
This paper extends the currently accepted model of inductive bias by identifying six categories of bias and separates inductive bias from the policy for its selection (the inductive policy). We analyze existing "blas selection " systems, examining the similarities and differences in their inductive policies, and idemify three techniques useful for building inductive policies. We then present a framework for representing and automaticaIly selecting a wide variety of biases and describe experiments with an instantiation of the framework addressing various pragmatic tradeoffs of time, space, accuracy, and the cost oferrors. The experiments show that a common framework can be used to implement policies for a variety of different types of blas selection, such as parameter selection, term selection, and example selection, using similar techniques. The experiments also show that different tradeoffs can be made by the implementation of different policies; for example, from the same data different rule sets can be learned based on different tradeoffs of accuracy versus the cost of erroneous predictions.
Simplifying Decision Trees: A Survey
, 1996
"... Induced decision trees are an extensively-researched solution to classification tasks. For many practical tasks, the trees produced by tree-generation algorithms are not comprehensible to users due to their size and complexity. Although many tree induction algorithms have been shown to produce simpl ..."
Abstract
-
Cited by 32 (5 self)
- Add to MetaCart
Induced decision trees are an extensively-researched solution to classification tasks. For many practical tasks, the trees produced by tree-generation algorithms are not comprehensible to users due to their size and complexity. Although many tree induction algorithms have been shown to produce simpler, more comprehensible trees (or data structures derived from trees) with good classification accuracy, tree simplification has usually been of secondary concern relative to accuracy and no attempt has been made to survey the literature from the perspective of simplification. We present a framework that organizes the approaches to tree simplification and summarize and critique the approaches within this framework. The purpose of this survey is to provide researchers and practitioners with a concise overview of tree-simplification approaches and insight into their relative capabilities. In our final discussion, we briefly describe some empirical findings and discuss the application of tree i...
An Exact Probability Metric for Decision Tree Splitting
- Machine Learning
, 1997
"... ID3's information gain heuristic is well-known to be biased towards multi-valued attributes. This bias is only partially compensated by the gain ratio used in C4.5. Several alternatives have been proposed, notably orthogonality and Beta. Gain ratio and orthogonality are strongly correlated, and all ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
ID3's information gain heuristic is well-known to be biased towards multi-valued attributes. This bias is only partially compensated by the gain ratio used in C4.5. Several alternatives have been proposed, notably orthogonality and Beta. Gain ratio and orthogonality are strongly correlated, and all of the metrics share a common bias towards splits with one or more small expected values, under circumstances where the split likely ocurred by chance. Both classical and Bayesian statistics lead to the multiple hypergeometric distribution as the posterior probability of the null hypothesis. Both gain and the chi-squared significance test are shown to arise in asymptotic approximations to the hypergeometric, revealing similar criteria for admissibility and showing the nature of their biases. Previous failures to find admissible stopping rules are traced to coupling these biased approximations with one another or with arbitrary thresholds; problems which are overcome by the hypergeometric. Em...
Small Sample Statistics for Classification Error Rates I: Error Rate Measurements
- Dept. of Inf. and Comp. Sci
, 1996
"... Several methods (independent subsamples, leave-one-out, cross-validation, and bootstrapping) have been proposed for estimating the error rates of classifiers. The rationale behind the various estimators and the causes of the sometimes conflicting claims regarding their bias and precision are explore ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
Several methods (independent subsamples, leave-one-out, cross-validation, and bootstrapping) have been proposed for estimating the error rates of classifiers. The rationale behind the various estimators and the causes of the sometimes conflicting claims regarding their bias and precision are explored in this paper. The biases and variances of each of the estimators are examined empirically. Cross-validation, 10-fold or greater, seems to be the best approach; the other methods are biased, have poorer precision, or are inconsistent. Though unbiased for linear discriminant classifiers, the 632b bootstrap estimator is biased for nearest neighbors classifiers, more so for single nearest neighbor than for three nearest neighbors. The 632b estimator is also biased for Cart-style decision trees. Weiss' loo* estimator is unbiased and has better precision than cross-validation for discriminant and nearest neighbors classifiers, but its lack of bias and improved precision for those classifiers do...
R-MINI: An Iterative Approach for Generating Minimal Rules from Examples
, 1997
"... Generating classification rules or decision trees from examples has been a subject of intense study in the pattern recognition community, the statistics community and the machine learning community of the artificial intelligence area. We pursue a point of view that minimality of rules is important, ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
Generating classification rules or decision trees from examples has been a subject of intense study in the pattern recognition community, the statistics community and the machine learning community of the artificial intelligence area. We pursue a point of view that minimality of rules is important, perhaps above all other considerations (biases) that come into play in generating rules. We present a new minimal rule generation algorithm called R-MINI (Rule-MINI) that is an adaptation of a well-established heuristic switching function minimization technique, MINI. The main mechanism that reduces the number of rules is repeated application of generalization and specialization operations to the rule set while maintaining completeness and consistency. R-MINI results on some benchmark cases are also presented. I. Introduction There are many approaches to generating Disjunctive Normal Form (DNF) rules from examples. The Aq family of rule generation and other approaches [1-4] incrementally c...
Discovering Probabilistic Decision Rules
- Int Journal of Approximate Reasoning
, 1993
"... Techniques to generate probabilistic decision rules are presented. These techniques are used to forecast or measure the competitiveness of companies. Rules estimating the competitiveness of companies are discovered. The generated rules are then applied to forecast the competitiveness of previously u ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
Techniques to generate probabilistic decision rules are presented. These techniques are used to forecast or measure the competitiveness of companies. Rules estimating the competitiveness of companies are discovered. The generated rules are then applied to forecast the competitiveness of previously unseen companies. Experimental results show that probabilistic decision rule technique outperforms many other machine learning and statistical techniques in this application domain. These findings are further confirmed in a second application, the classification of credits into either good or bad credits. 1 Introduction Companies all over the world use databases to store a large quantity and variety of information. Much research has been conducted on the efficient retrieval of raw data from these databases. For many applications, it is as important to discover new knowledge from the data as it is to retrieve the raw data itself. For example, let us assume that a credit-banker has access to...

