Results 1  10
of
37
Learning quickly when irrelevant attributes abound: A new linearthreshold algorithm
 Machine Learning
, 1988
"... learning Boolean functions, linearthreshold algorithms Abstract. Valiant (1984) and others have studied the problem of learning various classes of Boolean functions from examples. Here we discuss incremental learning of these functions. We consider a setting in which the learner responds to each ex ..."
Abstract

Cited by 680 (5 self)
 Add to MetaCart
learning Boolean functions, linearthreshold algorithms Abstract. Valiant (1984) and others have studied the problem of learning various classes of Boolean functions from examples. Here we discuss incremental learning of these functions. We consider a setting in which the learner responds to each example according to a current hypothesis. Then the learner updates the hypothesis, if necessary, based on the correct classification of the example. One natural measure of the quality of learning in this setting is the number of mistakes the learner makes. For suitable classes of functions, learning algorithms are available that make a bounded number of mistakes, with the bound independent of the number of examples seen by the learner. We present one such algorithm that learns disjunctive Boolean functions, along with variants for learning other classes of Boolean functions. The basic method can be expressed as a linearthreshold algorithm. A primary advantage of this algorithm is that the number of mistakes grows only logarithmically with the number of irrelevant attributes in the examples. At the same time, the algorithm is computationally efficient in both time and space. 1.
The Weighted Majority Algorithm
, 1994
"... We study the construction of prediction algorithms in a situation in which a learner faces a sequence of trials, with a prediction to be made in each, and the goal of the learner is to make few mistakes. We are interested in the case that the learner has reason to believe that one of some pool of kn ..."
Abstract

Cited by 678 (39 self)
 Add to MetaCart
We study the construction of prediction algorithms in a situation in which a learner faces a sequence of trials, with a prediction to be made in each, and the goal of the learner is to make few mistakes. We are interested in the case that the learner has reason to believe that one of some pool of known algorithms will perform well, but the learner does not know which one. A simple and effective method, based on weighted voting, is introduced for constructing a compound algorithm in such a circumstance. We call this method the Weighted Majority Algorithm. We show that this algorithm is robust in the presence of errors in the data. We discuss various versions of the Weighted Majority Algorithm and prove mistake bounds for them that are closely related to the mistake bounds of the best algorithms of the pool. For example, given a sequence of trials, if there is an algorithm in the pool A that makes at most m mistakes then the Weighted Majority Algorithm will make at most c(log jAj + m) mi...
Bounds on the Sample Complexity of Bayesian Learning Using Information Theory and the VC Dimension
 Machine Learning
, 1994
"... In this paper we study a Bayesian or averagecase model of concept learning with a twofold goal: to provide more precise characterizations of learning curve (sample complexity) behavior that depend on properties of both the prior distribution over concepts and the sequence of instances seen by the l ..."
Abstract

Cited by 108 (12 self)
 Add to MetaCart
In this paper we study a Bayesian or averagecase model of concept learning with a twofold goal: to provide more precise characterizations of learning curve (sample complexity) behavior that depend on properties of both the prior distribution over concepts and the sequence of instances seen by the learner, and to smoothly unite in a common framework the popular statistical physics and VC dimension theories of learning curves. To achieve this, we undertake a systematic investigation and comparison of two fundamental quantities in learning and information theory: the probability of an incorrect prediction for an optimal learning algorithm, and the Shannon information gain. This study leads to a new understanding of the sample complexity of learning in several existing models. 1 Introduction Consider a simple concept learning model in which the learner attempts to infer an unknown target concept f , chosen from a known concept class F of f0; 1gvalued functions over an instance space X....
A Guided Tour Across the Boundaries of Learning Recursive Languages
 Lecture Notes in Artificial Intelligence
, 1994
"... The present paper deals with the learnability of indexed families of uniformly recursive languages from positive data as well as from both, positive and negative data. We consider the influence of various monotonicity constraints to the learning process, and provide a thorough study concerning the i ..."
Abstract

Cited by 56 (29 self)
 Add to MetaCart
The present paper deals with the learnability of indexed families of uniformly recursive languages from positive data as well as from both, positive and negative data. We consider the influence of various monotonicity constraints to the learning process, and provide a thorough study concerning the influence of several parameters. In particular, we present examples pointing to typical problems and solutions in the field. Then we provide a unifying framework for learning. Furthermore, we survey results concerning learnability in dependence on the hypothesis space, and concerning order independence. Moreover, new results dealing with the efficiency of learning are provided. First, we investigate the power of iterative learning algorithms. The second measure of efficiency studied is the number of mind changes a learning algorithm is allowed to perform. In this setting we consider the problem whether or not the monotonicity constraints introduced do influence the efficiency of learning algo...
Online Prediction and Conversion Strategies
 Machine Learning
, 1994
"... We study the problem of deterministically predicting boolean values by combining the boolean predictions... ..."
Abstract

Cited by 50 (18 self)
 Add to MetaCart
We study the problem of deterministically predicting boolean values by combining the boolean predictions...
Teaching a Smarter Learner
 Journal of Computer and System Sciences
, 1994
"... We introduce a formal model of teaching in which the teacher is tailored to a particular learner, yet the teaching protocol is designed so that no collusion is possible. Not surprisingly, such a model remedies the nonintuitive aspects of other models in which the teacher must successfully teach ..."
Abstract

Cited by 39 (1 self)
 Add to MetaCart
We introduce a formal model of teaching in which the teacher is tailored to a particular learner, yet the teaching protocol is designed so that no collusion is possible. Not surprisingly, such a model remedies the nonintuitive aspects of other models in which the teacher must successfully teach any consistent learner. We prove that any class that can be exactly identified by a deterministic polynomialtime algorithm with access to a very rich set of examplebased queries is teachable by a computationally unbounded teacher and a polynomialtime learner. In addition, we present other general results relating this model of teaching to various previous results. We also consider the problem of designing teacher/learner pairs in which both the teacher and learner are polynomialtime algorithms and describe teacher/learner pairs for the classes of 1decision lists and Horn sentences. 1 Introduction Recently, there has been interest in developing formal models of teaching [4, 10, ...
Learning binary relations and total orders
 In Proceedings of the 30th Annual IEEE Symposium on Foundations of Computer Science
, 1989
"... Abstract. We study the problem of designing polynomial prediction algorithms for learning binary relations. We study these problems under an online model in which the instances are drawn by the learner, by a helpful teacher, by an adversary or according to a probability distribution on the instance ..."
Abstract

Cited by 36 (6 self)
 Add to MetaCart
Abstract. We study the problem of designing polynomial prediction algorithms for learning binary relations. We study these problems under an online model in which the instances are drawn by the learner, by a helpful teacher, by an adversary or according to a probability distribution on the instance space. We represent the relation as an n x m binary matrix, and present results for when the matrix is restricted to have at most k distinct row types, and when it is constrained by requiring that the predicate form a total order. 1
Prediction on a graph with the perceptron
 in Neural Information Processing Systems
, 2006
"... We study the problem of online prediction of a noisy labeling of a graph with the perceptron. We address both label noise and concept noise. Graph learning is framed as an instance of prediction on a finite set. To treat label noise we show that the hinge loss bounds derived by Gentile [1] for onlin ..."
Abstract

Cited by 26 (6 self)
 Add to MetaCart
We study the problem of online prediction of a noisy labeling of a graph with the perceptron. We address both label noise and concept noise. Graph learning is framed as an instance of prediction on a finite set. To treat label noise we show that the hinge loss bounds derived by Gentile [1] for online perceptron learning can be transformed to relative mistake bounds with an optimal leading constant when applied to prediction on a finite set. These bounds depend crucially on the norm of the learned concept. Often the norm of a concept can vary dramatically with only small perturbations in a labeling. We analyze a simple transformation that stabilizes the norm under perturbations. We derive an upper bound that depends only on natural properties of the graph – the graph diameter and the cut size of a partitioning of the graph – which are only indirectly dependent on the size of the graph. The impossibility of such bounds for the graph geodesic nearest neighbors algorithm will be demonstrated. 1
Online Prediction on Large Diameter Graphs
"... We continue our study of online prediction of the labelling of a graph. We show a fundamental limitation of Laplacianbased algorithms: if the graph has a large diameter then the number of mistakes made by such algorithms may be proportional to the square root of the number of vertices, even when ta ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
We continue our study of online prediction of the labelling of a graph. We show a fundamental limitation of Laplacianbased algorithms: if the graph has a large diameter then the number of mistakes made by such algorithms may be proportional to the square root of the number of vertices, even when tackling simple problems. We overcome this drawback by means of an efficient algorithm which achieves a logarithmic mistake bound. It is based on the notion of a spine, a path graph which provides a linear embedding of the original graph. In practice, graphs may exhibit cluster structure; thus in the last part, we present a modified algorithm which achieves the “best of both worlds”: it performs well locally in the presence of cluster structure, and globally on large diameter graphs. 1