Results 11  20
of
53
Learning Rules with Local Exceptions
 in European Conference on Computational Theory
, 1993
"... We present a learning algorithm for rulebased concept representations called rippledown rule sets. Rippledown rule sets allow us to deal with the exceptions for each rule separately by introducing exception rules, exception rules for each exception rule etc. up to a constant depth. These local exc ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
We present a learning algorithm for rulebased concept representations called rippledown rule sets. Rippledown rule sets allow us to deal with the exceptions for each rule separately by introducing exception rules, exception rules for each exception rule etc. up to a constant depth. These local exception rules are in contrast to decision lists, in which the exception rules must be placed into a global ordering of the rules. The localization of exceptions makes it possible to represent concepts that have no decision list representation. On the other hand, decision lists with a constant number of alternations between rules for different classes can be represented by constant depth rippledown rule sets with only a polynomial increase in size. Our algorithm is an Occam algorithm for constant depth rippledown rule sets and, hence, a PAC learning algorithm. It is based on repeatedly applying the greedy approximation method for the weighted set cover problem to find good exception rule set...
Seer: Maximum Likelihood Regression for LearningSpeed Curves
 University of Illinois at
, 1995
"... The research presented here focuses on modeling machinelearning performance. The thesis introduces Seer, a system that generates empirical observations of classificationlearning performance and then uses those observations to create statistical models. The models can be used to predict the number ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
The research presented here focuses on modeling machinelearning performance. The thesis introduces Seer, a system that generates empirical observations of classificationlearning performance and then uses those observations to create statistical models. The models can be used to predict the number of training examples needed to achieve a desired level and the maximum accuracy possible given an unlimited number of training examples. Seer advances the state of the art with 1) models that embody the best constraints for classification learning and most useful parameters, 2) algorithms that efficiently find maximumlikelihood models, and 3) a demonstration on realworld data from three domains of a practicable application of such modeling. The first part of the thesis gives an overview of the requirements for a good maximumlikelihood model of classificationlearning performance. Next, reasonable design choices for such models are explored. Selection among such models is a task of nonlinear programming, but by exploiting appropriate problem constraints, the task is reduced to a nonlinear regression task that can be solved with an efficient iterative algorithm. The latter part of the thesis describes almost 100 experiments in the domains of soybean disease, heart disease, and audiological problems. The tests show that Seer is excellent at characterizing learningperformance and that it seems to be as good as possible at predicting learning
In Defense of C4.5: Notes on Learning OneLevel Decision Trees
 Proc. of the 11th Int. Conf. on Machine Learning
, 1994
"... We discuss the implications of Holte's recentlypublished article, which demonstrated that on the most commonly used data very simple classification rules are almost as accurate as decision trees produced by Quinlan's C4.5. We consider, in particular, what is the significance of Holte's results for t ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
We discuss the implications of Holte's recentlypublished article, which demonstrated that on the most commonly used data very simple classification rules are almost as accurate as decision trees produced by Quinlan's C4.5. We consider, in particular, what is the significance of Holte's results for the future of topdown induction of decision trees. To an extent, Holte questioned the sense of further research on multilevel decision tree learning. We go in detail through all the parts of Holte's study. We try to put the results into perspective. We argue that the (in absolute terms) small difference in accuracy between 1R and C4.5 that was witnessed by Holte is still significant. We claim that C4.5 possesses additional accuracyrelated advantages over 1R. In addition we discuss the representativeness of the databases used by Holte. We compare empirically the optimal accuracies of multilevel and onelevel decision trees and observe some significant differences. We point out several defici...
Knowing What Doesn't Matter: Exploiting the Omission of Irrelevant Data
 Artificial Intelligence
, 1997
"... Most learning algorithms work most effectively when their training data contain completely specified labeled samples. In many diagnostic tasks, however, the data will include the values of only some of the attributes; we model this as a blocking process that hides the values of those attributes fro ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
Most learning algorithms work most effectively when their training data contain completely specified labeled samples. In many diagnostic tasks, however, the data will include the values of only some of the attributes; we model this as a blocking process that hides the values of those attributes from the learner. While blockers that remove the values of critical attributes can handicap a learner, this paper instead focuses on blockers that remove only conditionally irrelevant attribute values, i.e., values that are not needed to classify an instance, given the values of the other unblocked attributes. We first motivate and formalize this model of "superfluousvalue blocking," and then demonstrate that these omissions can be useful, by proving that certain classes that seem hard to learn in the general PAC model  viz., decision trees and DNF formulae  are trivial to learn in this setting. We then extend this model to deal with (1) theory revision (i.e., modifying an existing form...
Toward Attribute Efficient Learning of Decision Lists and Parities
 In Proceedings of COLT
, 2006
"... We consider two wellstudied problems regarding attribute efficient learning: learning decision lists and learning parity functions. First, we give an algorithm for learning decision lists of length k over n variables using 2 . This is the first algorithm for learning decision lists that h ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
We consider two wellstudied problems regarding attribute efficient learning: learning decision lists and learning parity functions. First, we give an algorithm for learning decision lists of length k over n variables using 2 . This is the first algorithm for learning decision lists that has both subexponential sample complexity and subexponential running time in the relevant parameters. Our approach establishes a relationship between attribute efficient learning and polynomial threshold functions and is based on a new construction of low degree, low weight polynomial threshold functions for decision lists. For a wide range of parameters our construction matches a lower bound due to Beigel for decision lists and gives an essentially optimal tradeoff between polynomial threshold function degree and weight.
Minimization of Decision Trees is Hard to Approximate
 In IEEE Conference on Computational Complexity
, 2002
"... Decision trees are representations of discrete functions with widespread applications in, e.g., complexity theory and data mining and exploration. In these areas it is important to obtain decision trees of small size. The minimization problem for decision trees is known to be NPhard. In this pap ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Decision trees are representations of discrete functions with widespread applications in, e.g., complexity theory and data mining and exploration. In these areas it is important to obtain decision trees of small size. The minimization problem for decision trees is known to be NPhard. In this paper the problem is shown to be even hard to approximate up to any constant factor.
On Learning in the Presence of Unspecified Attribute Values (Extended Abstract)
 In: Proceedings of the Twelfth Annual Conference on Computational Learning Theory
, 1999
"... ) Nader H. Bshouty David K. Wilson Department of Computer Science University of Calgary 2500 University Drive NW Calgary, AB, Canada T2N 1N4 Email: fbshouty, wilsondg@cpsc.ucalgary.ca Abstract We continue the study of learning in the presence of unspecified attribute values (UAV) where some of the ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
) Nader H. Bshouty David K. Wilson Department of Computer Science University of Calgary 2500 University Drive NW Calgary, AB, Canada T2N 1N4 Email: fbshouty, wilsondg@cpsc.ucalgary.ca Abstract We continue the study of learning in the presence of unspecified attribute values (UAV) where some of the attributes of the examples may be unspecified [9, 4]. A UAV assignment x 2 f0; 1; ?g n , where ? indicates unspecified, is classified positive (negative) with respect to a Boolean function f if all possible assignments for the unspecified attributes result in a positive (negative) classification. Otherwise, the classification of x is ?. Given an example x 2 f0; 1; ?g n , the oracle UAVMQ(x) responds with the classification of x with respect to the unknown target. Given a hypothesis h, the oracle UAVEQ returns an example x 2 f0; 1; ?g n for which h(x) is incorrect, if such an example exists. The new contributions of this paper are as follows. First we define a new oracle called the ...
Online Learning versus Offline Learning
"... . We present an offline variant of the mistakebound model of learning. Just like in the well studied online model, a learner in the offline model has to learn an unknown concept from a sequence of elements of the instance space on which he makes "guess and test" trials. In both models, the aim of ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
. We present an offline variant of the mistakebound model of learning. Just like in the well studied online model, a learner in the offline model has to learn an unknown concept from a sequence of elements of the instance space on which he makes "guess and test" trials. In both models, the aim of the learner is to make as few mistakes as possible. The difference between the models is that, while in the online model only the set of possible elements is known, in the offline model the sequence of elements (i.e., the identity of the elements as well as the order in which they are to be presented) is known to the learner in advance. We give a combinatorial characterization of the number of mistakes in the offline model. We apply this characterization to solve several natural questions that arise for the new model. First, we compare the mistake bounds of an offline learner to those of a learner learning the same concept classes in the online scenario. We show that the number of mis...
Computational Machine Learning in Theory and Praxis
, 1995
"... In the last few decades a computational approach to machine learning has emerged based on paradigms from recursion theory and the theory of computation. Such ideas include learning in the limit, learning by enumeration, and probably approximately correct (pac) learning. These models usually are not ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
In the last few decades a computational approach to machine learning has emerged based on paradigms from recursion theory and the theory of computation. Such ideas include learning in the limit, learning by enumeration, and probably approximately correct (pac) learning. These models usually are not suitable in practical situations. In contrast, statistics based inference methods have enjoyed a long and distinguished career. Currently, Bayesian reasoning in various forms, minimum message length (MML) and minimum description length (MDL), are widely applied approaches. They are the tools to use with particular machine learning praxis such as simulated annealing, genetic algorithms, genetic programming, artificial neural networks, and the like. These statistical inference methods select the hypothesis which minimizes the sum of the length of the description of the hypothesis (also called `model') and the length of the description of the data relative to the hypothesis. It appears to us th...
On PAC learning algorithms for rich Boolean function classes
, 2007
"... We give an overview of the fastest known algorithms for learning various expressive classes of Boolean functions in the Probably Approximately Correct (PAC) learning model. In addition to surveying previously known results, we use existing techniques to give the first known subexponentialtime algo ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
We give an overview of the fastest known algorithms for learning various expressive classes of Boolean functions in the Probably Approximately Correct (PAC) learning model. In addition to surveying previously known results, we use existing techniques to give the first known subexponentialtime algorithms for PAC learning two natural and expressive classes of Boolean functions: sparse polynomial threshold functions over the Boolean cube {0, 1}^n and sparse GF2 polynomials over {0, 1}^n.