Results 1  10
of
51
Learning quickly when irrelevant attributes abound: A new linearthreshold algorithm
 Machine Learning
, 1988
"... learning Boolean functions, linearthreshold algorithms Abstract. Valiant (1984) and others have studied the problem of learning various classes of Boolean functions from examples. Here we discuss incremental learning of these functions. We consider a setting in which the learner responds to each ex ..."
Abstract

Cited by 687 (5 self)
 Add to MetaCart
learning Boolean functions, linearthreshold algorithms Abstract. Valiant (1984) and others have studied the problem of learning various classes of Boolean functions from examples. Here we discuss incremental learning of these functions. We consider a setting in which the learner responds to each example according to a current hypothesis. Then the learner updates the hypothesis, if necessary, based on the correct classification of the example. One natural measure of the quality of learning in this setting is the number of mistakes the learner makes. For suitable classes of functions, learning algorithms are available that make a bounded number of mistakes, with the bound independent of the number of examples seen by the learner. We present one such algorithm that learns disjunctive Boolean functions, along with variants for learning other classes of Boolean functions. The basic method can be expressed as a linearthreshold algorithm. A primary advantage of this algorithm is that the number of mistakes grows only logarithmically with the number of irrelevant attributes in the examples. At the same time, the algorithm is computationally efficient in both time and space. 1.
Bounds on the sample complexity of Bayesian learning using information theory and the VCdimension
 Machine Learning
, 1994
"... ..."
(Show Context)
A Guided Tour Across the Boundaries of Learning Recursive Languages
 Lecture Notes in Artificial Intelligence
, 1994
"... The present paper deals with the learnability of indexed families of uniformly recursive languages from positive data as well as from both, positive and negative data. We consider the influence of various monotonicity constraints to the learning process, and provide a thorough study concerning the i ..."
Abstract

Cited by 57 (29 self)
 Add to MetaCart
The present paper deals with the learnability of indexed families of uniformly recursive languages from positive data as well as from both, positive and negative data. We consider the influence of various monotonicity constraints to the learning process, and provide a thorough study concerning the influence of several parameters. In particular, we present examples pointing to typical problems and solutions in the field. Then we provide a unifying framework for learning. Furthermore, we survey results concerning learnability in dependence on the hypothesis space, and concerning order independence. Moreover, new results dealing with the efficiency of learning are provided. First, we investigate the power of iterative learning algorithms. The second measure of efficiency studied is the number of mind changes a learning algorithm is allowed to perform. In this setting we consider the problem whether or not the monotonicity constraints introduced do influence the efficiency of learning algo...
Online Prediction and Conversion Strategies
 Machine Learning
, 1994
"... We study the problem of deterministically predicting boolean values by combining the boolean predictions... ..."
Abstract

Cited by 49 (18 self)
 Add to MetaCart
(Show Context)
We study the problem of deterministically predicting boolean values by combining the boolean predictions...
Teaching a Smarter Learner
 Journal of Computer and System Sciences
, 1994
"... We introduce a formal model of teaching in which the teacher is tailored to a particular learner, yet the teaching protocol is designed so that no collusion is possible. Not surprisingly, such a model remedies the nonintuitive aspects of other models in which the teacher must successfully teach ..."
Abstract

Cited by 43 (1 self)
 Add to MetaCart
(Show Context)
We introduce a formal model of teaching in which the teacher is tailored to a particular learner, yet the teaching protocol is designed so that no collusion is possible. Not surprisingly, such a model remedies the nonintuitive aspects of other models in which the teacher must successfully teach any consistent learner. We prove that any class that can be exactly identified by a deterministic polynomialtime algorithm with access to a very rich set of examplebased queries is teachable by a computationally unbounded teacher and a polynomialtime learner. In addition, we present other general results relating this model of teaching to various previous results. We also consider the problem of designing teacher/learner pairs in which both the teacher and learner are polynomialtime algorithms and describe teacher/learner pairs for the classes of 1decision lists and Horn sentences. 1 Introduction Recently, there has been interest in developing formal models of teaching [4, 10, ...
Learning binary relations and total orders
 In Proceedings of the 30th Annual IEEE Symposium on Foundations of Computer Science
, 1989
"... Abstract. We study the problem of designing polynomial prediction algorithms for learning binary relations. We study these problems under an online model in which the instances are drawn by the learner, by a helpful teacher, by an adversary or according to a probability distribution on the instance ..."
Abstract

Cited by 32 (5 self)
 Add to MetaCart
(Show Context)
Abstract. We study the problem of designing polynomial prediction algorithms for learning binary relations. We study these problems under an online model in which the instances are drawn by the learner, by a helpful teacher, by an adversary or according to a probability distribution on the instance space. We represent the relation as an n x m binary matrix, and present results for when the matrix is restricted to have at most k distinct row types, and when it is constrained by requiring that the predicate form a total order. 1
Prediction on a graph with the perceptron
 in Neural Information Processing Systems
, 2006
"... We study the problem of online prediction of a noisy labeling of a graph with the perceptron. We address both label noise and concept noise. Graph learning is framed as an instance of prediction on a finite set. To treat label noise we show that the hinge loss bounds derived by Gentile [1] for onlin ..."
Abstract

Cited by 25 (6 self)
 Add to MetaCart
(Show Context)
We study the problem of online prediction of a noisy labeling of a graph with the perceptron. We address both label noise and concept noise. Graph learning is framed as an instance of prediction on a finite set. To treat label noise we show that the hinge loss bounds derived by Gentile [1] for online perceptron learning can be transformed to relative mistake bounds with an optimal leading constant when applied to prediction on a finite set. These bounds depend crucially on the norm of the learned concept. Often the norm of a concept can vary dramatically with only small perturbations in a labeling. We analyze a simple transformation that stabilizes the norm under perturbations. We derive an upper bound that depends only on natural properties of the graph – the graph diameter and the cut size of a partitioning of the graph – which are only indirectly dependent on the size of the graph. The impossibility of such bounds for the graph geodesic nearest neighbors algorithm will be demonstrated. 1
Online Prediction on Large Diameter Graphs
"... We continue our study of online prediction of the labelling of a graph. We show a fundamental limitation of Laplacianbased algorithms: if the graph has a large diameter then the number of mistakes made by such algorithms may be proportional to the square root of the number of vertices, even when ta ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
(Show Context)
We continue our study of online prediction of the labelling of a graph. We show a fundamental limitation of Laplacianbased algorithms: if the graph has a large diameter then the number of mistakes made by such algorithms may be proportional to the square root of the number of vertices, even when tackling simple problems. We overcome this drawback by means of an efficient algorithm which achieves a logarithmic mistake bound. It is based on the notion of a spine, a path graph which provides a linear embedding of the original graph. In practice, graphs may exhibit cluster structure; thus in the last part, we present a modified algorithm which achieves the “best of both worlds”: it performs well locally in the presence of cluster structure, and globally on large diameter graphs. 1