Results 11  20
of
73
Agnostically Learning Decision Trees
, 2008
"... We give a query algorithm for agnostically learning decision trees with respect to the uniform distribution on inputs. Given blackbox access to an arbitrary binary function f on the ndimensional hypercube, our algorithm finds a function that agrees with f on almost (within an ɛ fraction) as many i ..."
Abstract

Cited by 17 (7 self)
 Add to MetaCart
We give a query algorithm for agnostically learning decision trees with respect to the uniform distribution on inputs. Given blackbox access to an arbitrary binary function f on the ndimensional hypercube, our algorithm finds a function that agrees with f on almost (within an ɛ fraction) as many inputs as the best sizet decision tree, in time poly(n, t, 1/ɛ). This is the first polynomialtime algorithm for learning decision trees in a harsh noise model. We also give a proper agnostic learning algorithm for juntas, a subclass of decision trees, again using membership queries. Conceptually, the present paper parallels recent work towards agnostic learning of halfspaces [13]; algorithmically, it is significantly more challenging. The core of our learning algorithm is a procedure to implicitly solve a convex optimization problem over the L1 ball in 2 n dimensions using an approximate gradient projection method.
Toward Attribute Efficient Learning of Decision Lists and Parities
"... We consider two wellstudied problems regarding attribute efficient learning: learning decision lists and learning parity functions. First, we give an algorithm for learning decision lists of length k over n variables using 2 Õ(k1/3) log n examples and time n Õ(k 1/3). This is the first algorithm ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
We consider two wellstudied problems regarding attribute efficient learning: learning decision lists and learning parity functions. First, we give an algorithm for learning decision lists of length k over n variables using 2 Õ(k1/3) log n examples and time n Õ(k 1/3). This is the first algorithm for learning decision lists that has both subexponential sample complexity and subexponential running time in the relevant parameters. Our approach is based on a new construction of low degree, low weight polynomial threshold functions for decision lists. For a wide range of parameters our construction matches a lower bound due to Beigel for decision lists and gives an essentially optimal tradeoff between polynomial threshold function degree and weight. Second, we give an algorithm for learning an unknown parity function on k out of n variables using O(n 1−1/k) examples in poly(n) time. For k = o(log n) this yields the first polynomial time algorithm for learning parity on a superconstant number of variables with sublinear sample complexity. We also give a simple algorithm for learning an unknown sizek parity using O(k log n) examples in n k/2 time, which improves on the naive n k time bound of exhaustive search.
Seer: Maximum Likelihood Regression for LearningSpeed Curves
 University of Illinois at
, 1995
"... The research presented here focuses on modeling machinelearning performance. The thesis introduces Seer, a system that generates empirical observations of classificationlearning performance and then uses those observations to create statistical models. The models can be used to predict the number ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
The research presented here focuses on modeling machinelearning performance. The thesis introduces Seer, a system that generates empirical observations of classificationlearning performance and then uses those observations to create statistical models. The models can be used to predict the number of training examples needed to achieve a desired level and the maximum accuracy possible given an unlimited number of training examples. Seer advances the state of the art with 1) models that embody the best constraints for classification learning and most useful parameters, 2) algorithms that efficiently find maximumlikelihood models, and 3) a demonstration on realworld data from three domains of a practicable application of such modeling. The first part of the thesis gives an overview of the requirements for a good maximumlikelihood model of classificationlearning performance. Next, reasonable design choices for such models are explored. Selection among such models is a task of nonlinear programming, but by exploiting appropriate problem constraints, the task is reduced to a nonlinear regression task that can be solved with an efficient iterative algorithm. The latter part of the thesis describes almost 100 experiments in the domains of soybean disease, heart disease, and audiological problems. The tests show that Seer is excellent at characterizing learningperformance and that it seems to be as good as possible at predicting learning
Learning Rules with Local Exceptions
 in European Conference on Computational Theory
, 1993
"... ..."
(Show Context)
Minimization of Decision Trees is Hard to Approximate
 In IEEE Conference on Computational Complexity
, 2002
"... Decision trees are representations of discrete functions with widespread applications in, e.g., complexity theory and data mining and exploration. In these areas it is important to obtain decision trees of small size. The minimization problem for decision trees is known to be NPhard. In this pap ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
Decision trees are representations of discrete functions with widespread applications in, e.g., complexity theory and data mining and exploration. In these areas it is important to obtain decision trees of small size. The minimization problem for decision trees is known to be NPhard. In this paper the problem is shown to be even hard to approximate up to any constant factor.
In Defense of C4.5: Notes on Learning OneLevel Decision Trees
 Proc. of the 11th Int. Conf. on Machine Learning
, 1994
"... We discuss the implications of Holte's recentlypublished article, which demonstrated that on the most commonly used data very simple classification rules are almost as accurate as decision trees produced by Quinlan's C4.5. We consider, in particular, what is the significance of Holte' ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
We discuss the implications of Holte's recentlypublished article, which demonstrated that on the most commonly used data very simple classification rules are almost as accurate as decision trees produced by Quinlan's C4.5. We consider, in particular, what is the significance of Holte's results for the future of topdown induction of decision trees. To an extent, Holte questioned the sense of further research on multilevel decision tree learning. We go in detail through all the parts of Holte's study. We try to put the results into perspective. We argue that the (in absolute terms) small difference in accuracy between 1R and C4.5 that was witnessed by Holte is still significant. We claim that C4.5 possesses additional accuracyrelated advantages over 1R. In addition we discuss the representativeness of the databases used by Holte. We compare empirically the optimal accuracies of multilevel and onelevel decision trees and observe some significant differences. We point out several defici...
Knowing what doesn’t matter: exploiting the omission of irrelevant data
 Artificial Intelligence
, 1997
"... Most learning algorithms work most e ectively when their training data contain completely speci ed labeled samples. In many diagnostic tasks, however, the data will include the values of only some of the attributes � we model this as a blocking process that hides the values of those attributes from ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
Most learning algorithms work most e ectively when their training data contain completely speci ed labeled samples. In many diagnostic tasks, however, the data will include the values of only some of the attributes � we model this as a blocking process that hides the values of those attributes from the learner. While blockers that remove the values of critical attributes can handicap a learner, this paper instead focuses on blockers that remove only conditionally irrelevant attribute values, i.e., values that are not needed to classify an instance, given the values of the other unblocked attributes. We rst motivate and formalize this model of \super uousvalue blocking, " and then demonstrate that these omissions can be useful, by proving that certain classes that seem hard to learn in the general PAC model  viz., decision trees and DNF formulae  are trivial to learn in this setting. We then extend this model to deal with (1) theory revision (i.e., modifying an existing formula) � (2) blockers that occasionally include super uous values or exclude required values � and (3) other corruptions of the training data.
Learning with decision lists of datadependent features
 JOURNAL OF MACHINE LEARNING REASEARCH
, 2005
"... We present a learning algorithm for decision lists which allows features that are constructed from the data and allows a tradeoff between accuracy and complexity. We provide bounds on the generalization error of this learning algorithm in terms of the number of errors and the size of the classifier ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
We present a learning algorithm for decision lists which allows features that are constructed from the data and allows a tradeoff between accuracy and complexity. We provide bounds on the generalization error of this learning algorithm in terms of the number of errors and the size of the classifier it finds on the training data. We also compare its performance on some natural data sets with the set covering machine and the support vector machine. Furthermore, we show that the proposed bounds on the generalization error provide effective guides for model selection.
On Learning in the Presence of Unspecified Attribute Values (Extended Abstract)
 In: Proceedings of the Twelfth Annual Conference on Computational Learning Theory
, 1999
"... ) Nader H. Bshouty David K. Wilson Department of Computer Science University of Calgary 2500 University Drive NW Calgary, AB, Canada T2N 1N4 Email: fbshouty, wilsondg@cpsc.ucalgary.ca Abstract We continue the study of learning in the presence of unspecified attribute values (UAV) where some of the ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
) Nader H. Bshouty David K. Wilson Department of Computer Science University of Calgary 2500 University Drive NW Calgary, AB, Canada T2N 1N4 Email: fbshouty, wilsondg@cpsc.ucalgary.ca Abstract We continue the study of learning in the presence of unspecified attribute values (UAV) where some of the attributes of the examples may be unspecified [9, 4]. A UAV assignment x 2 f0; 1; ?g n , where ? indicates unspecified, is classified positive (negative) with respect to a Boolean function f if all possible assignments for the unspecified attributes result in a positive (negative) classification. Otherwise, the classification of x is ?. Given an example x 2 f0; 1; ?g n , the oracle UAVMQ(x) responds with the classification of x with respect to the unknown target. Given a hypothesis h, the oracle UAVEQ returns an example x 2 f0; 1; ?g n for which h(x) is incorrect, if such an example exists. The new contributions of this paper are as follows. First we define a new oracle called the ...
Learning
, 2004
"... random logdepth decision trees under the uniform distribution ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
random logdepth decision trees under the uniform distribution