Results 1  10
of
177
The strength of weak learnability
 Machine Learning
, 1990
"... Abstract. This paper addresses the problem of improving the accuracy of an hypothesis output by a learning algorithm in the distributionfree (PAC) learning model. A concept class is learnable (or strongly learnable) if, given access to a Source of examples of the unknown concept, the learner with h ..."
Abstract

Cited by 791 (22 self)
 Add to MetaCart
Abstract. This paper addresses the problem of improving the accuracy of an hypothesis output by a learning algorithm in the distributionfree (PAC) learning model. A concept class is learnable (or strongly learnable) if, given access to a Source of examples of the unknown concept, the learner with high probability is able to output an hypothesis that is correct on all but an arbitrarily small fraction of the instances. The concept class is weakly learnable if the learner can produce an hypothesis that performs only slightly better than random guessing. In this paper, it is shown that these two notions of learnability are equivalent. A method is described for converting a weak learning algorithm into one that achieves arbitrarily high accuracy. This construction may have practical applications as a tool for efficiently converting a mediocre learning algorithm into one that performs extremely well. In addition, the construction has some interesting theoretical consequences, including a set of general upper bounds on the complexity of any strong learning algorithm as a function of the allowed error e.
Learning quickly when irrelevant attributes abound: A new linearthreshold algorithm
 Machine Learning
, 1988
"... learning Boolean functions, linearthreshold algorithms Abstract. Valiant (1984) and others have studied the problem of learning various classes of Boolean functions from examples. Here we discuss incremental learning of these functions. We consider a setting in which the learner responds to each ex ..."
Abstract

Cited by 738 (5 self)
 Add to MetaCart
(Show Context)
learning Boolean functions, linearthreshold algorithms Abstract. Valiant (1984) and others have studied the problem of learning various classes of Boolean functions from examples. Here we discuss incremental learning of these functions. We consider a setting in which the learner responds to each example according to a current hypothesis. Then the learner updates the hypothesis, if necessary, based on the correct classification of the example. One natural measure of the quality of learning in this setting is the number of mistakes the learner makes. For suitable classes of functions, learning algorithms are available that make a bounded number of mistakes, with the bound independent of the number of examples seen by the learner. We present one such algorithm that learns disjunctive Boolean functions, along with variants for learning other classes of Boolean functions. The basic method can be expressed as a linearthreshold algorithm. A primary advantage of this algorithm is that the number of mistakes grows only logarithmically with the number of irrelevant attributes in the examples. At the same time, the algorithm is computationally efficient in both time and space. 1.
Learning Decision Lists
, 2001
"... This paper introduces a new representation for Boolean functions, called decision lists, and shows that they are efficiently learnable from examples. More precisely, this result is established for \kDL" { the set of decision lists with conjunctive clauses of size k at each decision. Since k ..."
Abstract

Cited by 413 (0 self)
 Add to MetaCart
This paper introduces a new representation for Boolean functions, called decision lists, and shows that they are efficiently learnable from examples. More precisely, this result is established for \kDL" { the set of decision lists with conjunctive clauses of size k at each decision. Since kDL properly includes other wellknown techniques for representing Boolean functions such as kCNF (formulae in conjunctive normal form with at most k literals per clause), kDNF (formulae in disjunctive normal form with at most k literals per term), and decision trees of depth k, our result strictly increases the set of functions which are known to be polynomially learnable, in the sense of Valiant (1984). Our proof is constructive: we present an algorithm which can efficiently construct an element of kDL consistent with a given set of examples, if one exists.
Cryptographic Limitations on Learning Boolean Formulae and Finite Automata
 PROCEEDINGS OF THE TWENTYFIRST ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING
, 1989
"... In this paper we prove the intractability of learning several classes of Boolean functions in the distributionfree model (also called the Probably Approximately Correct or PAC model) of learning from examples. These results are representation independent, in that they hold regardless of the syntact ..."
Abstract

Cited by 342 (15 self)
 Add to MetaCart
In this paper we prove the intractability of learning several classes of Boolean functions in the distributionfree model (also called the Probably Approximately Correct or PAC model) of learning from examples. These results are representation independent, in that they hold regardless of the syntactic form in which the learner chooses to represent its hypotheses. Our methods reduce the problems of cracking a number of wellknown publickey cryptosystems to the learning problems. We prove that a polynomialtime learning algorithm for Boolean formulae, deterministic finite automata or constantdepth threshold circuits would have dramatic consequences for cryptography and number theory: in particular, such an algorithm could be used to break the RSA cryptosystem, factor Blum integers (composite numbers equivalent to 3 modulo 4), and detect quadratic residues. The results hold even if the learning algorithm is only required to obtain a slight advantage in prediction over random guessing. The techniques used demonstrate an interesting duality between learning and cryptography. We also apply our results to obtain strong intractability results for approximating a generalization of graph coloring.
Efficient noisetolerant learning from statistical queries
 JOURNAL OF THE ACM
, 1998
"... In this paper, we study the problem of learning in the presence of classification noise in the probabilistic learning model of Valiant and its variants. In order to identify the class of “robust” learning algorithms in the most general way, we formalize a new but related model of learning from stat ..."
Abstract

Cited by 333 (5 self)
 Add to MetaCart
(Show Context)
In this paper, we study the problem of learning in the presence of classification noise in the probabilistic learning model of Valiant and its variants. In order to identify the class of “robust” learning algorithms in the most general way, we formalize a new but related model of learning from statistical queries. Intuitively, in this model, a learning algorithm is forbidden to examine individual examples of the unknown target function, but is given access to an oracle providing estimates of probabilities over the sample space of random examples. One of our main results shows that any class of functions learnable from statistical queries is in fact learnable with classification noise in Valiant’s model, with a noise rate approaching the informationtheoretic barrier of 1/2. We then demonstrate the generality of the statistical query model, showing that practically every class learnable in Valiant’s model and its variants can also be learned in the new model (and thus can be learned in the presence of noise). A notable exception to this statement is the class of parity functions, which we prove is not learnable from statistical queries, and for which no noisetolerant algorithm is known.
Toward efficient agnostic learning
 In Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory
, 1992
"... Abstract. In this paper we initiate an investigation of generalizations of the Probably Approximately Correct (PAC) learning model that attempt to significantly weaken the target function assumptions. The ultimate goal in this direction is informally termed agnostic learning, in which we make virtua ..."
Abstract

Cited by 219 (7 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper we initiate an investigation of generalizations of the Probably Approximately Correct (PAC) learning model that attempt to significantly weaken the target function assumptions. The ultimate goal in this direction is informally termed agnostic learning, in which we make virtually no assumptions on the target function. The name derives from the fact that as designers of learning algorithms, we give up the belief that Nature (as represented by the target function) has a simple or succinct explanation. We give a number of positive and negative results that provide an initial outline of the possibilities for agnostic learning. Our results include hardness results for the most obvious generalization of the PAC model to an agnostic setting, an efficient and general agnostic learning method based on dynamic programming, relationships between loss functions for agnostic learning, and an algorithm for a learning problem that involves hidden variables.
Training a 3Node Neural Network is NPComplete
, 1992
"... We consider a 2layer, 3node, ninput neural network whose nodes compute linear threshold functions of their inputs. We show that it is NPcomplete to decide whether there exist weights and thresholds for this network so that it produces output consistent with a given set of training examples. We ..."
Abstract

Cited by 215 (3 self)
 Add to MetaCart
(Show Context)
We consider a 2layer, 3node, ninput neural network whose nodes compute linear threshold functions of their inputs. We show that it is NPcomplete to decide whether there exist weights and thresholds for this network so that it produces output consistent with a given set of training examples. We extend the result to other simple networks. We also present a network for which training is hard but where switching to a more powerful representation makes training easier. These results suggest that those looking for perfect training algorithms cannot escape inherent computational difficulties just by considering only simple or very regular networks. They also suggest the importance, given a training problem, of finding an appropriate network and input encoding for that problem. It is left as an open problem to extend our result to nodes with nonlinear functions such as sigmoids.
Learning in the Presence of Malicious Errors
 SIAM Journal on Computing
, 1993
"... In this paper we study an extension of the distributionfree model of learning introduced by Valiant [23] (also known as the probably approximately correct or PAC model) that allows the presence of malicious errors in the examples given to a learning algorithm. Such errors are generated by an advers ..."
Abstract

Cited by 182 (12 self)
 Add to MetaCart
(Show Context)
In this paper we study an extension of the distributionfree model of learning introduced by Valiant [23] (also known as the probably approximately correct or PAC model) that allows the presence of malicious errors in the examples given to a learning algorithm. Such errors are generated by an adversary with unbounded computational power and access to the entire history of the learning algorithm's computation. Thus, we study a worstcase model of errors. Our results include general methods for bounding the rate of error tolerable by any learning algorithm, efficient algorithms tolerating nontrivial rates of malicious errors, and equivalences between problems of learning with errors and standard combinatorial optimization problems. 1 Introduction In this paper, we study a practical extension to Valiant's distributionfree model of learning: the presence of errors (possibly maliciously generated by an adversary) in the sample data. The distributionfree model typically makes the idealize...
An Efficient MembershipQuery Algorithm for Learning DNF with Respect to the Uniform Distribution
, 1994
"... We present a membershipquery algorithm for efficiently learning DNF with respect to the uniform distribution. In fact, the algorithm properly learns with respect to uniform the class TOP of Boolean functions expressed as a majority vote over parity functions. We also describe extensions of this alg ..."
Abstract

Cited by 175 (13 self)
 Add to MetaCart
We present a membershipquery algorithm for efficiently learning DNF with respect to the uniform distribution. In fact, the algorithm properly learns with respect to uniform the class TOP of Boolean functions expressed as a majority vote over parity functions. We also describe extensions of this algorithm for learning DNF over certain nonuniform distributions and for learning a class of geometric concepts that generalizes DNF. Furthermore, we show that DNF is weakly learnable with respect to uniform from noisy examples. Our strong learning algorithm utilizes one of Freund's boosting techniques and relies on the fact that boosting does not require a completely distributionindependent weak learner. The boosted weak learner is a nonuniform extension of a parityfinding algorithm discovered by Goldreich and Levin. 3 1 Introduction Consider the following 20questionslike game between two players, Bob and Alice. Bob has a Disjunctive Normal Form (DNF) expression f in mind. Alice is allo...
Weakly Learning DNF and Characterizing Statistical Query Learning Using Fourier Analysis
 IN PROCEEDINGS OF THE TWENTYSIXTH ANNUAL SYMPOSIUM ON THEORY OF COMPUTING
, 1994
"... We present new results on the wellstudied problem of learning DNF expressions. We prove that an algorithm due to Kushilevitz and Mansour [13] can be used to weakly learn DNF formulas with membership queries with respect to the uniform distribution. This is the rst positive result known for learn ..."
Abstract

Cited by 132 (23 self)
 Add to MetaCart
We present new results on the wellstudied problem of learning DNF expressions. We prove that an algorithm due to Kushilevitz and Mansour [13] can be used to weakly learn DNF formulas with membership queries with respect to the uniform distribution. This is the rst positive result known for learning general DNF in polynomial time in a nontrivial model. Our results should be contrasted with those of Kharitonov [12], who proved that AC 0 is not eciently learnable in this model based on cryptographic assumptions. We also present ecient learning algorithms in various models for the readk and SATk subclasses of DNF. We then turn our attention to the recently introduced statistical query model of learning [9]. This model is a restricted version of the popular Probably Approximately Correct (PAC) model, and practically every PAC learning algorithm falls into the statistical query model [9]. We prove that DNF and decision trees are not even weakly learnable in polynomial time in this model. This result is informationtheoretic and therefore does not rely on any unproven assumptions, and demonstrates that no straightforward modication of the existing algorithms for learning various restricted forms of DNF and decision trees will solve the general problem. These lower bounds are a corollary of a more general characterization of the complexity of statistical query learning in terms of the number of uncorrelated functions in the concept class. The underlying tool for all of our results is the Fourier analysis of the concept class to be learned.