Results 1  10
of
112
Generating Accurate Rule Sets Without Global Optimization
 IN: PROC. OF THE 15TH INT. CONFERENCE ON MACHINE LEARNING
, 1998
"... The two dominant schemes for rulelearning, C4.5 and RIPPER, both operate in two stages. First they induce an initial rule set and then they refine it using a rather complex optimization stage that discards (C4.5) or adjusts (RIPPER) individual rules to make them work better together. In contrast, t ..."
Abstract

Cited by 191 (7 self)
 Add to MetaCart
The two dominant schemes for rulelearning, C4.5 and RIPPER, both operate in two stages. First they induce an initial rule set and then they refine it using a rather complex optimization stage that discards (C4.5) or adjusts (RIPPER) individual rules to make them work better together. In contrast, this paper shows how good rule sets can be learned one rule at a time, without any need for global optimization. We present an algorithm for inferring rules by repeatedly generating partial decision trees, thus combining the two major paradigms for rule generation  creating rules from decision trees and the separateandconquer rulelearning technique. The algorithm is straightforward and elegant: despite this, experiments on standard datasets show that it produces rule sets that are as accurate as and of similar size to those generated by C4.5, and more accurate than RIPPER's. Moreover, it operates efficiently, and because it avoids postprocessing, does not suffer the extremely slow performance on pathological example sets for which the C4.5 method has been criticized.
A Perspective on Inductive Logic Programming
"... . The stateoftheart in inductive logic programming is surveyed by analyzing the approach taken by this field over the past 8 years. The analysis investigates the roles of 1) logic programming and machine learning, of 2) theory, techniques and applications, of 3) various technical problems address ..."
Abstract

Cited by 56 (8 self)
 Add to MetaCart
. The stateoftheart in inductive logic programming is surveyed by analyzing the approach taken by this field over the past 8 years. The analysis investigates the roles of 1) logic programming and machine learning, of 2) theory, techniques and applications, of 3) various technical problems addressed within inductive logic programming. 1 Introduction The term inductive logic programming was first coined by Stephen Muggleton in 1990 [1]. Inductive logic programming is concerned with the study of inductive machine learning within the representations offered by computational logic. Since 1991, annual international workshops have been organized [28]. This paper is an attempt to analyze the developments within this field. Particular attention is devoted to the relation between inductive logic programming and its neighboring fields such as machine learning, computational logic and data mining, and to the role that theory, techniques and implementations, and applications play. The analysis...
A Case Study in Using Linguistic Phrases for Text Categorization on the WWW
 In Working Notes of the AAAI/ICML Workshop on Learning for Text Categorization
, 1998
"... Most learning algorithms that are applied to text categorization problems rely on a bagofwords document representation, i.e., each word occurring in the document is considered as a separate feature. In this paper, we investigate the use of linguistic phrases as input features for text categorizati ..."
Abstract

Cited by 49 (9 self)
 Add to MetaCart
Most learning algorithms that are applied to text categorization problems rely on a bagofwords document representation, i.e., each word occurring in the document is considered as a separate feature. In this paper, we investigate the use of linguistic phrases as input features for text categorization problems. These features are based on information extraction patterns that are generated and used by the AutoSlogTS system. We present experimental results on using such features as background knowledge for two machine learning algorithms on a classification task on the WWW. The results show that phrasal features can improve the precision of learned theories at the expense of coverage.
ROC 'n' Rule Learning  Towards a Better Understanding of Covering Algorithms
 Machine Learning
, 2005
"... This paper provides an analysis of the behavior of separateandconquer or covering rule learning algorithms by visualizing their evaluation metrics and their dynamics in PNspace, a variant of ROCspace. Our results show that most commonly used search heuristics, including accuracy, weighted relativ ..."
Abstract

Cited by 48 (13 self)
 Add to MetaCart
This paper provides an analysis of the behavior of separateandconquer or covering rule learning algorithms by visualizing their evaluation metrics and their dynamics in PNspace, a variant of ROCspace. Our results show that most commonly used search heuristics, including accuracy, weighted relative accuracy, entropy, and Gini index, are equivalent to one of two fundamental prototypes: precision, which tries to optimize the area under the ROC curve for unknown costs, and a costweighted difference between covered positive and negative examples, which tries to find the optimal point under known or assumed costs. We also show that a straightforward generalization of the mestimate trades off these two prototypes. Furthermore, our results show that stopping and filtering criteria like CN2's significance test focus on identifying significant deviations from random classification, which does not necessarily avoid overfitting. We also identify a problem with Foil's MDLbased encoding length restriction, which proves to be largely equivalent to a variable threshold on the recall of the rule. In general, we interpret these results as evidence that, contrary to common conception, prepruning heuristics are not very well understood and deserve more investigation.
An Analysis of Rule Evaluation Metrics
 Proceedings of the 20th International Conference on Machine Learning (ICML03
, 2003
"... In this paper we analyze the most popular evaluation metrics for separateandconquer rule learning algorithms. Our results show that all commonly used heuristics, including accuracy, weighted relative accuracy, entropy, Gini index and information gain, are equivalent to one of two fundamental ..."
Abstract

Cited by 41 (11 self)
 Add to MetaCart
In this paper we analyze the most popular evaluation metrics for separateandconquer rule learning algorithms. Our results show that all commonly used heuristics, including accuracy, weighted relative accuracy, entropy, Gini index and information gain, are equivalent to one of two fundamental prototypes: precision, which tries to optimize the area under the ROC curve for unknown costs, and a costweighted difference between covered positive and negative examples, which tries to find the optimal point under known or assumed costs. We also show that a straightforward generalization of the mestimate trades off these two prototypes.
WellTrained PETs: Improving Probability Estimation Trees
, 2000
"... Decision trees are one of the most effective and widely used classification methods. However, many applications require class probability estimates, and probability estimation trees (PETs) have the same attractive features as classification trees (e.g., comprehensibility, accuracy and efficiency in ..."
Abstract

Cited by 36 (6 self)
 Add to MetaCart
Decision trees are one of the most effective and widely used classification methods. However, many applications require class probability estimates, and probability estimation trees (PETs) have the same attractive features as classification trees (e.g., comprehensibility, accuracy and efficiency in high dimensions and on large data sets). Unfortunately, decision trees have been found to provide poor probability estimates. Several techniques have been proposed to build more accurate PETs, but, to our knowledge, there has not been a systematic experimental analysis of which techniques actually improve the probability estimates, and by how much. In this paper we first discuss why the decisiontree representation is not intrinsically inadequate for probability estimation. Inaccurate probabilities are partially the result of decisiontree induction algorithms that focus on maximizing classification accuracy and minimizing tree size (for example via reducederror pruning). Larger tree...
Machine Discoveries: A Few Simple, Robust Local Expression Principles
 Journal of New Music Research
, 2001
"... The paper presents a new approach to discovering general rules of expressive music performance from real performance data via inductive machine learning. A new learning algorithm is briey presented, and then an experiment with a very large data set (performances of 13 Mozart piano sonatas) is des ..."
Abstract

Cited by 35 (10 self)
 Add to MetaCart
The paper presents a new approach to discovering general rules of expressive music performance from real performance data via inductive machine learning. A new learning algorithm is briey presented, and then an experiment with a very large data set (performances of 13 Mozart piano sonatas) is described. The new learning algorithm succeeds in discovering some extremely simple and general principles of musical performance (at the level of individual notes), in the form of categorical prediction rules. These rules turn out to be very robust and general: when tested on performances by a dierent pianist and even on music of a dierent style (Chopin), they exhibit a surprisingly high degree of predictive accuracy.
Discovering Interesting Patterns for Investment Decision Making with GLOWER  A Genetic Learner Overlaid With Entropy Reduction
, 2000
"... Prediction in financial domains is notoriously difficult for a number of reasons. First, theories tend to be weak or nonexistent, which makes problem formulation open ended by forcing us to consider a large number of independent variables and thereby increasing the dimensionality of the search spac ..."
Abstract

Cited by 31 (0 self)
 Add to MetaCart
Prediction in financial domains is notoriously difficult for a number of reasons. First, theories tend to be weak or nonexistent, which makes problem formulation open ended by forcing us to consider a large number of independent variables and thereby increasing the dimensionality of the search space. Second, the weak relationships among variables tend to be nonlinear, and may hold only in limited areas of the search space. Third, in financial practice, where analysts conduct extensive manual analysis of historically well performing indicators, a key is to find the hidden interactions among variables that perform well in combination. Unfortunately, these are exactly the patterns that the greedy search biases incorporated by many standard rule learning algorithms will miss. In this paper, we describe and evaluate several variations of a new genetic learning algorithm (GLOWER) on a variety of data sets. The design of GLOWER has been motivated by financial prediction problems, but incorpo...
Discovering Simple Rules in Complex Data: A MetaLearning Algorithm and Some Surprising Musical Discoveries
 ARTIFICIAL INTELLIGENCE
, 2001
"... This article presents a new rule discovery algorithm named PLCG that can find simple, robust partial rule models (sets of classification rules) in complex data where it is difficult or impossible to find models that completely account for all the phenomena of interest. Technically speaking, ..."
Abstract

Cited by 28 (8 self)
 Add to MetaCart
This article presents a new rule discovery algorithm named PLCG that can find simple, robust partial rule models (sets of classification rules) in complex data where it is difficult or impossible to find models that completely account for all the phenomena of interest. Technically speaking,
Delegating Classifiers
 IN PROC. 21ST INTERNATIONAL CONFERENCE ON MACHINE LEARNING
, 2004
"... A sensible use of classifiers must be based on the estimated reliability of their predictions. A cautious classifier would delegate the difficult or uncertain predictions to other, possibly more specialised, classifiers. In this paper we analyse and develop this idea of delegating classifiers i ..."
Abstract

Cited by 28 (1 self)
 Add to MetaCart
A sensible use of classifiers must be based on the estimated reliability of their predictions. A cautious classifier would delegate the difficult or uncertain predictions to other, possibly more specialised, classifiers. In this paper we analyse and develop this idea of delegating classifiers in a systematic way. First, we design a twostep scenario where a first classifier chooses which examples to classify and delegates the difficult examples to train a second classifier. Secondly, we present an iterated scenario involving an arbitrary number of chained classifiers. We compare these scenarios to classical ensemble methods, such as bagging and boosting. We show experimentally that our approach is not far behind these methods in terms of accuracy, but with several advantages: (i) improved efficiency, since each classifier learns from fewer examples than the previous one; (ii) improved comprehensibility, since each classification derives from a single classifier; and (iii) the possibility to simplify the overall multiclassifier by removing the parts that lead to delegation.