Results 1  10
of
10
Learning From a Consistently Ignorant Teacher
, 1994
"... One view of computational learning theory is that of a learner acquiring the knowledge of a teacher. We introduce a formal model of learning capturing the idea that teachers may have gaps in their knowledge. In particular, we consider learning from a teacher who labels examples "+" (a positive in ..."
Abstract

Cited by 22 (8 self)
 Add to MetaCart
One view of computational learning theory is that of a learner acquiring the knowledge of a teacher. We introduce a formal model of learning capturing the idea that teachers may have gaps in their knowledge. In particular, we consider learning from a teacher who labels examples "+" (a positive instance of the concept being learned), "\Gamma" (a negative instance of the concept being learned), and "?" (an instance with unknown classification), in such a way that knowledge of the concept class and all the positive and negative examples is not sufficient to determine the labelling of any of the examples labelled with "?". The goal of the learner is not to compensate for the ignorance of the teacher by attempting to infer "+" or "\Gamma" labels for the examples labelled with "?", but is rather to learn (an approximation to) the ternary labelling presented by the teacher. Thus, the goal of the learner is still to acquire the knowledge of the teacher, but now the learner must also ...
PAC Learning Intersections of Halfspaces with Membership Queries
 ALGORITHMICA
, 1998
"... A randomized learning algorithm Polly is presented that efficiently learns intersections of s halfspaces in n dimensions, in time polynomial in both s and n. The learning protocol is the "PAC" (probably approximately correct) model of Valiant, augmented with membership queries. In particular, Polly ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
A randomized learning algorithm Polly is presented that efficiently learns intersections of s halfspaces in n dimensions, in time polynomial in both s and n. The learning protocol is the "PAC" (probably approximately correct) model of Valiant, augmented with membership queries. In particular, Polly receives a set S of m = poly(n; s; 1=ffl; 1=ffi) randomly generated points from an arbitrary distribution over the unit hypercube, and is told exactly which points are contained in, and which points are not contained in, the convex polyhedron P defined by the halfspaces. Polly may also obtain the same information about points of its own choosing. It is shown that after poly(n, s, 1=ffl, 1=ffi, log(1=d)) time, the probability that Polly fails to output a collection of s halfspaces with classification error at most ffl, is at most ffi . Here, d is the minimum distance between the boundary of the target and those examples in S that are not lying on the boundary. The parameter log(1=d) can be ...
Pac learning with nasty noise
 Theoretical Computer Science
, 1999
"... We introduce a new model for learning in the presence of noise, which we call the Nasty Noise model. This model generalizes previously considered models of learning with noise. The learning process in this model, which is a variant of the PAC model, proceeds as follows: Suppose that the learning alg ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
We introduce a new model for learning in the presence of noise, which we call the Nasty Noise model. This model generalizes previously considered models of learning with noise. The learning process in this model, which is a variant of the PAC model, proceeds as follows: Suppose that the learning algorithm during its execution asks for m examples. The examples that the algorithm gets are generated by a nasty adversary that works according to a following steps. First, the adversary chooses m examples (independently) according to the fixed (but unknown to the learning algorithm) distribution D as in the PACmodel. Then the powerful adversary, upon seeing the specific m examples that were chosen (and using his knowledge of the target function, the distribution D and the learning algorithm), is allowed to remove a fraction of the examples at its choice, and replace these examples by the same number of arbitrary examples of its choice; the m modified examples are then given to the learning algorithm. The only restriction on the adversary is that the number of examples that the adversary is allowed to modify should be distributed according to a binomial distribution with parameters η (the noise rate) and m. On the negative side, we prove that no algorithm can achieve accuracy of ɛ < 2η in learning
Learning of Depth Two Neural Networks with Constant Fanin at the Hidden Nodes (Extended Abstract)
 In Proc. 9th Annu. Conf. on Comput. Learning Theory
, 1996
"... We present algorithms for learning depth two neural networks where the hidden nodes are threshold gates with constant fanin. The transfer function of the output node might be more general: we have results for the cases when the threshold function, the logistic function or the identity function is u ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
We present algorithms for learning depth two neural networks where the hidden nodes are threshold gates with constant fanin. The transfer function of the output node might be more general: we have results for the cases when the threshold function, the logistic function or the identity function is used as the transfer function at the output node. We give batch and online learning algorithms for these classes of neural networks and prove bounds on the performance of our algorithms. The batch algorithms work for real valued inputs whereas the online algorithms assume that the inputs are discretized. The hypotheses of our algorithms are essentially also neural networks of depth two. However, their number of hidden nodes might be much larger than the number of hidden nodes of the neural network that has to be learned. Our algorithms can handle such a large number of hidden nodes since they rely on multiplicative weight updates at the output node, and the performance of these algorithms s...
A Composition Theorem for Learning Algorithms with Applications to Geometric Concept Classes
 In Proceedings of the 29th Annual ACM Symposium on Theory of Computing (STOC
, 1997
"... This paper solves the open problem of exact learning geometric objects bounded by hyperplanes (and more generally by any constant degree algebraic surfaces) in the constant dimensional space from equivalence queries only (i.e., in the online learning model). We present a novel approach that allows, ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
This paper solves the open problem of exact learning geometric objects bounded by hyperplanes (and more generally by any constant degree algebraic surfaces) in the constant dimensional space from equivalence queries only (i.e., in the online learning model). We present a novel approach that allows, under certain conditions, the composition of learning algorithms for simple classes into an algorithm for a more complicated class. Informally speaking, it shows that if a class of concepts C is learnable in time t using a small space then C ? , the class of all functions of the form f(g 1 ; : : : ; g m ) with g 1 ; : : : ; gm 2 C and any f , is learnable in polynomial time in t and m. We then show that the class of halfspaces in a fixed dimension space is learnable with a small space. 1 Introduction Littlestone's online learning model [L88, L89] is one of the major models of learning. Learnability in this model implies learnability in Valiant's PAC model [Val84], and is equivalent to l...
When Can Two Unsupervised Learners Achieve PAC Separation?
 PAC Separation? Procs. of COLT/EUROCOLT, LNAI 2111
, 2001
"... . In this paper we study a new restriction of the PAC learning framework, in which each label class is handled by an unsupervised learner that aims to t an appropriate probability distribution to its own data. A hypothesis is derived by choosing, for any unlabeled instance, the label whose distr ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
. In this paper we study a new restriction of the PAC learning framework, in which each label class is handled by an unsupervised learner that aims to t an appropriate probability distribution to its own data. A hypothesis is derived by choosing, for any unlabeled instance, the label whose distribution assigns it the higher likelihood. The motivation for the new learning setting is that the general approach of tting separate distributions to each label class, is often used in practice for classication problems. The set of probability distributions that is obtained is more useful than a collection of decision boundaries. A question that arises, however, is whether it is ever more tractable (in terms of computational complexity or samplesize required) to nd a simple decision boundary than to divide the problem up into separate unsupervised learning problems and nd appropriate distributions. Within the framework, we give algorithms for learning various simple geometric concept classes. In the boolean domain we show how to learn parity functions, and functions having a constant upper bound on the number of relevant attributes. These results distinguish the new setting from various other wellknown restrictions of PAClearning. We give an algorithm for learning monomials over input vectors generated by an unknown product distribution. The main open problem is whether monomials (or any other concept class) distinguish learnability in this framework from standard PAClearnability. 1
Learning Boxes in High Dimension
 Algorithmica
, 1997
"... We present exact learning algorithms that learn several classes of (discrete) boxes in f0; : : : ; ` \Gamma 1g n . In particular we learn: (1) The class of unions of O(log n) boxes in time poly(n; log `) (solving an open problem of [16, 12]; in [3] this class is shown to be learnable in time poly ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
We present exact learning algorithms that learn several classes of (discrete) boxes in f0; : : : ; ` \Gamma 1g n . In particular we learn: (1) The class of unions of O(log n) boxes in time poly(n; log `) (solving an open problem of [16, 12]; in [3] this class is shown to be learnable in time poly(n; `)). (2) The class of unions of disjoint boxes in time poly(n; t; log `), where t is the number of boxes. (Previously this was known only in the case where all boxes are disjoint in one of the dimensions; in [3] this class is shown to be learnable in time poly(n; t; `)). In particular our algorithm learns the class of decision trees over n variables, that take values in f0; : : : ; ` \Gamma 1g, with comparison nodes in time poly(n; t; log `), where t is the number of leaves (this was an open problem in [9] which was shown in [4] to be learnable in time poly(n; t; `)). (3) The class of unions of O(1)degenerate boxes (that is, boxes that depend only on O(1) variables) in time poly(n; t;...
Learning Fixeddimension Linear Thresholds From Fragmented Data
 in Procs of the 1999 Conference on Computational Learning Theory
, 1999
"... We investigate PAClearning in a situation in which examples (consisting of an input vector and 0/1 label) have some of the components of the input vector concealed from the learner. This is a special case of Restricted Focus of Attention (RFA) learning. Our interest here is in 1RFA learning, where ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
We investigate PAClearning in a situation in which examples (consisting of an input vector and 0/1 label) have some of the components of the input vector concealed from the learner. This is a special case of Restricted Focus of Attention (RFA) learning. Our interest here is in 1RFA learning, where only a single component of an input vector is given, for each example. We argue that 1RFA learning merits special consideration within the wider eld of RFA learning. It is the most restrictive form of RFA learning (so that positive results apply in general), and it models a typical \data fusion" scenario, where we have sets of observations from a number of separate sensors, but these sensors are uncorrelated sources. Within this setting we study the wellknown class of linear threshold functions, the characteristic functions of Euclidean halfspaces. The sample complexity (i.e. samplesize requirement as a function of the parameters) of this learning problem is aected by the input distri...
Verification as Learning Geometric Concepts
"... Abstract. We formalize the problem of program verification as a learning problem, showing that invariants in program verification can be regarded as geometric concepts in machine learning. Safety properties define bad states: states a program should not reach. Program verification explains why a pro ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. We formalize the problem of program verification as a learning problem, showing that invariants in program verification can be regarded as geometric concepts in machine learning. Safety properties define bad states: states a program should not reach. Program verification explains why a program’s set of reachable states is disjoint from the set of bad states. In Hoare Logic, these explanations are predicates that form inductive assertions. Using samples for reachable and bad states and by applying well known machine learning algorithms for classification, we are able to generate inductive assertions. By relaxing the search for an exact proof to classifiers, we obtain complexity theoretic improvements. Further, we extend the learning algorithm to obtain a sound procedure that can generate proofs containing invariants that are arbitrary boolean combinations of polynomial inequalities. We have evaluated our approach on a number of challenging benchmarks and the results are promising.