Results 11 - 20
of
45
Learning Rules with Local Exceptions
- in European Conference on Computational Theory
, 1993
"... We present a learning algorithm for rule-based concept representations called rippledown rule sets. Ripple-down rule sets allow us to deal with the exceptions for each rule separately by introducing exception rules, exception rules for each exception rule etc. up to a constant depth. These local exc ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
We present a learning algorithm for rule-based concept representations called rippledown rule sets. Ripple-down rule sets allow us to deal with the exceptions for each rule separately by introducing exception rules, exception rules for each exception rule etc. up to a constant depth. These local exception rules are in contrast to decision lists, in which the exception rules must be placed into a global ordering of the rules. The localization of exceptions makes it possible to represent concepts that have no decision list representation. On the other hand, decision lists with a constant number of alternations between rules for different classes can be represented by constant depth ripple-down rule sets with only a polynomial increase in size. Our algorithm is an Occam algorithm for constant depth ripple-down rule sets and, hence, a PAC learning algorithm. It is based on repeatedly applying the greedy approximation method for the weighted set cover problem to find good exception rule set...
Seer: Maximum Likelihood Regression for Learning-Speed Curves
- University of Illinois at
, 1995
"... The research presented here focuses on modeling machine-learning performance. The thesis introduces Seer, a system that generates empirical observations of classification-learning performance and then uses those observations to create statistical models. The models can be used to predict the number ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
The research presented here focuses on modeling machine-learning performance. The thesis introduces Seer, a system that generates empirical observations of classification-learning performance and then uses those observations to create statistical models. The models can be used to predict the number of training examples needed to achieve a desired level and the maximum accuracy possible given an unlimited number of training examples. Seer advances the state of the art with 1) models that embody the best constraints for classification learning and most useful parameters, 2) algorithms that efficiently find maximum-likelihood models, and 3) a demonstration on real-world data from three domains of a practicable application of such modeling. The first part of the thesis gives an overview of the requirements for a good maximum-likelihood model of classification-learning performance. Next, reasonable design choices for such models are explored. Selection among such models is a task of nonlinear programming, but by exploiting appropriate problem constraints, the task is reduced to a nonlinear regression task that can be solved with an efficient iterative algorithm. The latter part of the thesis describes almost 100 experiments in the domains of soybean disease, heart disease, and audiological problems. The tests show that Seer is excellent at characterizing learning-performance and that it seems to be as good as possible at predicting learning
In Defense of C4.5: Notes on Learning One-Level Decision Trees
- Proc. of the 11th Int. Conf. on Machine Learning
, 1994
"... We discuss the implications of Holte's recentlypublished article, which demonstrated that on the most commonly used data very simple classification rules are almost as accurate as decision trees produced by Quinlan's C4.5. We consider, in particular, what is the significance of Holte's results for t ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
We discuss the implications of Holte's recentlypublished article, which demonstrated that on the most commonly used data very simple classification rules are almost as accurate as decision trees produced by Quinlan's C4.5. We consider, in particular, what is the significance of Holte's results for the future of top-down induction of decision trees. To an extent, Holte questioned the sense of further research on multilevel decision tree learning. We go in detail through all the parts of Holte's study. We try to put the results into perspective. We argue that the (in absolute terms) small difference in accuracy between 1R and C4.5 that was witnessed by Holte is still significant. We claim that C4.5 possesses additional accuracy-related advantages over 1R. In addition we discuss the representativeness of the databases used by Holte. We compare empirically the optimal accuracies of multilevel and one-level decision trees and observe some significant differences. We point out several defici...
Knowing What Doesn't Matter: Exploiting the Omission of Irrelevant Data
- Artificial Intelligence
, 1997
"... Most learning algorithms work most effectively when their training data contain completely specified labeled samples. In many diagnostic tasks, however, the data will include the values of only some of the attributes; we model this as a blocking process that hides the values of those attributes fro ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Most learning algorithms work most effectively when their training data contain completely specified labeled samples. In many diagnostic tasks, however, the data will include the values of only some of the attributes; we model this as a blocking process that hides the values of those attributes from the learner. While blockers that remove the values of critical attributes can handicap a learner, this paper instead focuses on blockers that remove only conditionally irrelevant attribute values, i.e., values that are not needed to classify an instance, given the values of the other unblocked attributes. We first motivate and formalize this model of "superfluous-value blocking," and then demonstrate that these omissions can be useful, by proving that certain classes that seem hard to learn in the general PAC model --- viz., decision trees and DNF formulae --- are trivial to learn in this setting. We then extend this model to deal with (1) theory revision (i.e., modifying an existing form...
Toward Attribute Efficient Learning of Decision Lists and Parities
- In Proceedings of COLT
, 2006
"... We consider two well-studied problems regarding attribute efficient learning: learning decision lists and learning parity functions. First, we give an algorithm for learning decision lists of length k over n variables using 2 . This is the first algorithm for learning decision lists that h ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
We consider two well-studied problems regarding attribute efficient learning: learning decision lists and learning parity functions. First, we give an algorithm for learning decision lists of length k over n variables using 2 . This is the first algorithm for learning decision lists that has both subexponential sample complexity and subexponential running time in the relevant parameters. Our approach establishes a relationship between attribute efficient learning and polynomial threshold functions and is based on a new construction of low degree, low weight polynomial threshold functions for decision lists. For a wide range of parameters our construction matches a lower bound due to Beigel for decision lists and gives an essentially optimal tradeoff between polynomial threshold function degree and weight.
On Learning in the Presence of Unspecified Attribute Values (Extended Abstract)
- In: Proceedings of the Twelfth Annual Conference on Computational Learning Theory
, 1999
"... ) Nader H. Bshouty David K. Wilson Department of Computer Science University of Calgary 2500 University Drive NW Calgary, AB, Canada T2N 1N4 Email: fbshouty, wilsondg@cpsc.ucalgary.ca Abstract We continue the study of learning in the presence of unspecified attribute values (UAV) where some of the ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
) Nader H. Bshouty David K. Wilson Department of Computer Science University of Calgary 2500 University Drive NW Calgary, AB, Canada T2N 1N4 Email: fbshouty, wilsondg@cpsc.ucalgary.ca Abstract We continue the study of learning in the presence of unspecified attribute values (UAV) where some of the attributes of the examples may be unspecified [9, 4]. A UAV assignment x 2 f0; 1; ?g n , where ? indicates unspecified, is classified positive (negative) with respect to a Boolean function f if all possible assignments for the unspecified attributes result in a positive (negative) classification. Otherwise, the classification of x is ?. Given an example x 2 f0; 1; ?g n , the oracle UAV-MQ(x) responds with the classification of x with respect to the unknown target. Given a hypothesis h, the oracle UAV-EQ returns an example x 2 f0; 1; ?g n for which h(x) is incorrect, if such an example exists. The new contributions of this paper are as follows. First we define a new oracle called the ...
Minimization of Decision Trees is Hard to Approximate
- In IEEE Conference on Computational Complexity
, 2002
"... Decision trees are representations of discrete functions with widespread applications in, e.g., complexity theory and data mining and exploration. In these areas it is important to obtain decision trees of small size. The minimization problem for decision trees is known to be NP-hard. In this pap ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Decision trees are representations of discrete functions with widespread applications in, e.g., complexity theory and data mining and exploration. In these areas it is important to obtain decision trees of small size. The minimization problem for decision trees is known to be NP-hard. In this paper the problem is shown to be even hard to approximate up to any constant factor.
Computational Machine Learning in Theory and Praxis
, 1995
"... In the last few decades a computational approach to machine learning has emerged based on paradigms from recursion theory and the theory of computation. Such ideas include learning in the limit, learning by enumeration, and probably approximately correct (pac) learning. These models usually are not ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In the last few decades a computational approach to machine learning has emerged based on paradigms from recursion theory and the theory of computation. Such ideas include learning in the limit, learning by enumeration, and probably approximately correct (pac) learning. These models usually are not suitable in practical situations. In contrast, statistics based inference methods have enjoyed a long and distinguished career. Currently, Bayesian reasoning in various forms, minimum message length (MML) and minimum description length (MDL), are widely applied approaches. They are the tools to use with particular machine learning praxis such as simulated annealing, genetic algorithms, genetic programming, artificial neural networks, and the like. These statistical inference methods select the hypothesis which minimizes the sum of the length of the description of the hypothesis (also called `model') and the length of the description of the data relative to the hypothesis. It appears to us th...
Online Learning versus Offline Learning
"... . We present an off-line variant of the mistake-bound model of learning. Just like in the well studied on-line model, a learner in the offline model has to learn an unknown concept from a sequence of elements of the instance space on which he makes "guess and test" trials. In both models, the aim of ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
. We present an off-line variant of the mistake-bound model of learning. Just like in the well studied on-line model, a learner in the offline model has to learn an unknown concept from a sequence of elements of the instance space on which he makes "guess and test" trials. In both models, the aim of the learner is to make as few mistakes as possible. The difference between the models is that, while in the on-line model only the set of possible elements is known, in the off-line model the sequence of elements (i.e., the identity of the elements as well as the order in which they are to be presented) is known to the learner in advance. We give a combinatorial characterization of the number of mistakes in the off-line model. We apply this characterization to solve several natural questions that arise for the new model. First, we compare the mistake bounds of an off-line learner to those of a learner learning the same concept classes in the on-line scenario. We show that the number of mis...
On PAC learning algorithms for rich Boolean function classes
, 2007
"... We give an overview of the fastest known algorithms for learning various expressive classes of Boolean functions in the Probably Approximately Correct (PAC) learning model. In addition to surveying previously known results, we use existing techniques to give the first known subexponential-time algo ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We give an overview of the fastest known algorithms for learning various expressive classes of Boolean functions in the Probably Approximately Correct (PAC) learning model. In addition to surveying previously known results, we use existing techniques to give the first known subexponential-time algorithms for PAC learning two natural and expressive classes of Boolean functions: sparse polynomial threshold functions over the Boolean cube {0, 1}^n and sparse GF2 polynomials over {0, 1}^n.

