Results 1  10
of
31
The Weighted Majority Algorithm
, 1994
"... We study the construction of prediction algorithms in a situation in which a learner faces a sequence of trials, with a prediction to be made in each, and the goal of the learner is to make few mistakes. We are interested in the case that the learner has reason to believe that one of some pool of kn ..."
Abstract

Cited by 671 (38 self)
 Add to MetaCart
We study the construction of prediction algorithms in a situation in which a learner faces a sequence of trials, with a prediction to be made in each, and the goal of the learner is to make few mistakes. We are interested in the case that the learner has reason to believe that one of some pool of known algorithms will perform well, but the learner does not know which one. A simple and effective method, based on weighted voting, is introduced for constructing a compound algorithm in such a circumstance. We call this method the Weighted Majority Algorithm. We show that this algorithm is robust in the presence of errors in the data. We discuss various versions of the Weighted Majority Algorithm and prove mistake bounds for them that are closely related to the mistake bounds of the best algorithms of the pool. For example, given a sequence of trials, if there is an algorithm in the pool A that makes at most m mistakes then the Weighted Majority Algorithm will make at most c(log jAj + m) mi...
The strength of weak learnability
 Machine Learning
, 1990
"... Abstract. This paper addresses the problem of improving the accuracy of an hypothesis output by a learning algorithm in the distributionfree (PAC) learning model. A concept class is learnable (or strongly learnable) if, given access to a Source of examples of the unknown concept, the learner with h ..."
Abstract

Cited by 667 (22 self)
 Add to MetaCart
Abstract. This paper addresses the problem of improving the accuracy of an hypothesis output by a learning algorithm in the distributionfree (PAC) learning model. A concept class is learnable (or strongly learnable) if, given access to a Source of examples of the unknown concept, the learner with high probability is able to output an hypothesis that is correct on all but an arbitrarily small fraction of the instances. The concept class is weakly learnable if the learner can produce an hypothesis that performs only slightly better than random guessing. In this paper, it is shown that these two notions of learnability are equivalent. A method is described for converting a weak learning algorithm into one that achieves arbitrarily high accuracy. This construction may have practical applications as a tool for efficiently converting a mediocre learning algorithm into one that performs extremely well. In addition, the construction has some interesting theoretical consequences, including a set of general upper bounds on the complexity of any strong learning algorithm as a function of the allowed error e.
A bound on the label complexity of agnostic active learning
 In Proc. of the 24th international conference on Machine learning
, 2007
"... We study the label complexity of poolbased active learning in the agnostic PAC model. Specifically, we derive general bounds on the number of label requests made by the A 2 algorithm proposed by Balcan, Beygelzimer & Langford (Balcan et al., 2006). This represents the first nontrivial generalpurpo ..."
Abstract

Cited by 65 (10 self)
 Add to MetaCart
We study the label complexity of poolbased active learning in the agnostic PAC model. Specifically, we derive general bounds on the number of label requests made by the A 2 algorithm proposed by Balcan, Beygelzimer & Langford (Balcan et al., 2006). This represents the first nontrivial generalpurpose upperboundonlabelcomplexityintheagnostic PAC model. 1.
Sample compression, learnability, and the VapnikChervonenkis dimension
 MACHINE LEARNING
, 1995
"... Within the framework of paclearning, we explore the learnability of concepts from samples using the paradigm of sample compression schemes. A sample compression scheme of size k for a concept class C ` 2 X consists of a compression function and a reconstruction function. The compression function r ..."
Abstract

Cited by 61 (3 self)
 Add to MetaCart
Within the framework of paclearning, we explore the learnability of concepts from samples using the paradigm of sample compression schemes. A sample compression scheme of size k for a concept class C ` 2 X consists of a compression function and a reconstruction function. The compression function receives a finite sample set consistent with some concept in C and chooses a subset of k examples as the compression set. The reconstruction function forms a hypothesis on X from a compression set of k examples. For any sample set of a concept in C the compression set produced by the compression function must lead to a hypothesis consistent with the whole original sample set when it is fed to the reconstruction function. We demonstrate that the existence of a sample compression scheme of fixedsize for a class C is sufficient to ensure that the class C is paclearnable. Previous work has shown that a class is paclearnable if and only if the VapnikChervonenkis (VC) dimension of the class i...
Teaching a Smarter Learner
 Journal of Computer and System Sciences
, 1994
"... We introduce a formal model of teaching in which the teacher is tailored to a particular learner, yet the teaching protocol is designed so that no collusion is possible. Not surprisingly, such a model remedies the nonintuitive aspects of other models in which the teacher must successfully teach ..."
Abstract

Cited by 40 (1 self)
 Add to MetaCart
We introduce a formal model of teaching in which the teacher is tailored to a particular learner, yet the teaching protocol is designed so that no collusion is possible. Not surprisingly, such a model remedies the nonintuitive aspects of other models in which the teacher must successfully teach any consistent learner. We prove that any class that can be exactly identified by a deterministic polynomialtime algorithm with access to a very rich set of examplebased queries is teachable by a computationally unbounded teacher and a polynomialtime learner. In addition, we present other general results relating this model of teaching to various previous results. We also consider the problem of designing teacher/learner pairs in which both the teacher and learner are polynomialtime algorithms and describe teacher/learner pairs for the classes of 1decision lists and Horn sentences. 1 Introduction Recently, there has been interest in developing formal models of teaching [4, 10, ...
The Learnability of Description Logics with Equality Constraints
 Machine Learning
, 1994
"... Although there is an increasing amount of experimental research on learning concepts expressed in firstorder logic, there are still relatively few formal results on the polynomial learnability of firstorder representations from examples. Most previous analyses in the pacmodel have focused on s ..."
Abstract

Cited by 36 (3 self)
 Add to MetaCart
Although there is an increasing amount of experimental research on learning concepts expressed in firstorder logic, there are still relatively few formal results on the polynomial learnability of firstorder representations from examples. Most previous analyses in the pacmodel have focused on subsets of Prolog, and only a few highly restricted subsets have been shown to be learnable. In this paper, we will study instead the learnability of the restricted firstorder logics known as "description logics", also sometimes called "terminological logics" or "KLONEtype languages". Description logics are also subsets of predicate calculus, but are expressed using a different syntax, allowing a different set of syntactic restrictions to be explored. We first define a simple description logic, summarize some results on its expressive power, and then analyze its learnability. It is shown that the full logic cannot be tractably learned. However, syntactic restrictions exist that enable tractable learning from positive examples alone, independent of the size of the vocabulary used to describe examples. The learnable sublanguage appears to be incomparable in expressive power to any subset of firstorder logic previously known to be learnable.
A Subexponential Exact Learning Algorithm for DNF Using Equivalence Queries
 Information Processing Letters
, 1996
"... We present a 2 time exact learning algorithm for polynomial size DNF using equivalence queries only. In particular, DNF is PAClearnable in subexponential time under any distribution. This is the first subexponential time PAClearning algorithm for DNF under any distribution. ..."
Abstract

Cited by 26 (0 self)
 Add to MetaCart
We present a 2 time exact learning algorithm for polynomial size DNF using equivalence queries only. In particular, DNF is PAClearnable in subexponential time under any distribution. This is the first subexponential time PAClearning algorithm for DNF under any distribution.
Algebraic Foundation and Improved Methods of Induction of Ripple Down Rules
 In
, 1996
"... Ripple down rules (RDR), that is rules with hierarchical exceptions, are used in knowledge acquisition because they provide a well intelligible and modifiable representation for even very large expert systems. In this paper a formal semantics for RDRs is proposed, that covers first order rules as we ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
Ripple down rules (RDR), that is rules with hierarchical exceptions, are used in knowledge acquisition because they provide a well intelligible and modifiable representation for even very large expert systems. In this paper a formal semantics for RDRs is proposed, that covers first order rules as well as attributevalue based rules. An algebraic foundation is proposed, including simplification of RDRs and transformation of RDRs into flat lists of rules and ripple down rule sets, hence these knowledge representation schemes are put into perspective. It is shown, that a RDR has a shorter description length than an equivalent decision list. Induction of rules with exceptions is characterized as bidirectional movement in the hypothesis space, while known algorithms for learning rules or decision trees either perform a topdown specialization of the most general or a bottomup generalization of the most special hypothesis. Known algorithms for induction of RDRs are summarized and compared a...
Separating DistributionFree And MistakeBound Learning Models Over The Boolean Domain
 SIAM J. COMPUT
, 1990
"... Two of the most commonly used models in computational learning theory are the distributionfree model in which examples are chosen from a fixed but arbitrary distribution, and the absolute mistakebound model in which examples are presented in an arbitrary order. Over the Boolean domain , it is ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
Two of the most commonly used models in computational learning theory are the distributionfree model in which examples are chosen from a fixed but arbitrary distribution, and the absolute mistakebound model in which examples are presented in an arbitrary order. Over the Boolean domain , it is known that if the learner is allowed unlimited computational resources then any concept class learnable in one model is also learnable in the other. In addition, any polynomialtime learning algorithm for a concept class in the mistakebound model can be transformed into one that learns the class in the distributionfree model. This paper
Online Learning with Malicious Noise and the Closure Algorithm
 Proc. of the 5th International Workshop on Algorithmic Learning Theory, LNAI, 872
, 1994
"... . We investigate a variant of the online learning model for classes of f0; 1gvalued functions (concepts) in which the labels of a certain amount of the input instances are corrupted by adversarial noise. We propose an extension of a general learning strategy, known as "Closure Algorithm", to this ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
. We investigate a variant of the online learning model for classes of f0; 1gvalued functions (concepts) in which the labels of a certain amount of the input instances are corrupted by adversarial noise. We propose an extension of a general learning strategy, known as "Closure Algorithm", to this noise model, and show a worstcase mistake bound of m+ (d + 1)K for learning an arbitrary intersectionclosed concept class C, where K is the number of noisy labels, d is a combinatorial parameter measuring C's complexity, and m is the worstcase mistake bound of the Closure Algorithm for learning C in the noisefree model. For several concept classes our extended Closure Algorithm is efficient and can tolerate a noise rate equal to the informationtheoretic upper bound. We also show how to efficiently turn any algorithm for the online noise model into a learning algorithm for the PAC model with malicious noise. 1 Introduction In the online learning model introduced in [1, 15] a learner...