Results 1  10
of
288
Cryptographic Limitations on Learning Boolean Formulae and Finite Automata
 PROCEEDINGS OF THE TWENTYFIRST ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING
, 1989
"... In this paper we prove the intractability of learning several classes of Boolean functions in the distributionfree model (also called the Probably Approximately Correct or PAC model) of learning from examples. These results are representation independent, in that they hold regardless of the syntact ..."
Abstract

Cited by 312 (15 self)
 Add to MetaCart
In this paper we prove the intractability of learning several classes of Boolean functions in the distributionfree model (also called the Probably Approximately Correct or PAC model) of learning from examples. These results are representation independent, in that they hold regardless of the syntactic form in which the learner chooses to represent its hypotheses. Our methods reduce the problems of cracking a number of wellknown publickey cryptosystems to the learning problems. We prove that a polynomialtime learning algorithm for Boolean formulae, deterministic finite automata or constantdepth threshold circuits would have dramatic consequences for cryptography and number theory: in particular, such an algorithm could be used to break the RSA cryptosystem, factor Blum integers (composite numbers equivalent to 3 modulo 4), and detect quadratic residues. The results hold even if the learning algorithm is only required to obtain a slight advantage in prediction over random guessing. The techniques used demonstrate an interesting duality between learning and cryptography. We also apply our results to obtain strong intractability results for approximating a generalization of graph coloring.
Efficient noisetolerant learning from statistical queries
 JOURNAL OF THE ACM
, 1998
"... In this paper, we study the problem of learning in the presence of classification noise in the probabilistic learning model of Valiant and its variants. In order to identify the class of “robust” learning algorithms in the most general way, we formalize a new but related model of learning from stat ..."
Abstract

Cited by 301 (5 self)
 Add to MetaCart
In this paper, we study the problem of learning in the presence of classification noise in the probabilistic learning model of Valiant and its variants. In order to identify the class of “robust” learning algorithms in the most general way, we formalize a new but related model of learning from statistical queries. Intuitively, in this model, a learning algorithm is forbidden to examine individual examples of the unknown target function, but is given access to an oracle providing estimates of probabilities over the sample space of random examples. One of our main results shows that any class of functions learnable from statistical queries is in fact learnable with classification noise in Valiant’s model, with a noise rate approaching the informationtheoretic barrier of 1/2. We then demonstrate the generality of the statistical query model, showing that practically every class learnable in Valiant’s model and its variants can also be learned in the new model (and thus can be learned in the presence of noise). A notable exception to this statement is the class of parity functions, which we prove is not learnable from statistical queries, and for which no noisetolerant algorithm is known.
Toward efficient agnostic learning
 In Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory
, 1992
"... Abstract. In this paper we initiate an investigation of generalizations of the Probably Approximately Correct (PAC) learning model that attempt to significantly weaken the target function assumptions. The ultimate goal in this direction is informally termed agnostic learning, in which we make virtua ..."
Abstract

Cited by 201 (7 self)
 Add to MetaCart
Abstract. In this paper we initiate an investigation of generalizations of the Probably Approximately Correct (PAC) learning model that attempt to significantly weaken the target function assumptions. The ultimate goal in this direction is informally termed agnostic learning, in which we make virtually no assumptions on the target function. The name derives from the fact that as designers of learning algorithms, we give up the belief that Nature (as represented by the target function) has a simple or succinct explanation. We give a number of positive and negative results that provide an initial outline of the possibilities for agnostic learning. Our results include hardness results for the most obvious generalization of the PAC model to an agnostic setting, an efficient and general agnostic learning method based on dynamic programming, relationships between loss functions for agnostic learning, and an algorithm for a learning problem that involves hidden variables.
Learning Decision Trees using the Fourier Spectrum
, 1991
"... This work gives a polynomial time algorithm for learning decision trees with respect to the uniform distribution. (This algorithm uses membership queries.) The decision tree model that is considered is an extension of the traditional boolean decision tree model that allows linear operations in each ..."
Abstract

Cited by 189 (10 self)
 Add to MetaCart
This work gives a polynomial time algorithm for learning decision trees with respect to the uniform distribution. (This algorithm uses membership queries.) The decision tree model that is considered is an extension of the traditional boolean decision tree model that allows linear operations in each node (i.e., summation of a subset of the input variables over GF (2)). This paper shows how to learn in polynomial time any function that can be approximated (in norm L 2 ) by a polynomially sparse function (i.e., a function with only polynomially many nonzero Fourier coefficients). The authors demonstrate that any function f whose L 1 norm (i.e., the sum of absolute value of the Fourier coefficients) is polynomial can be approximated by a polynomially sparse function, and prove that boolean decision trees with linear operations are a subset of this class of functions. Moreover, it is shown that the functions with polynomial L 1 norm can be learned deterministically. The algorithm can a...
An Efficient MembershipQuery Algorithm for Learning DNF with Respect to the Uniform Distribution
, 1994
"... We present a membershipquery algorithm for efficiently learning DNF with respect to the uniform distribution. In fact, the algorithm properly learns with respect to uniform the class TOP of Boolean functions expressed as a majority vote over parity functions. We also describe extensions of this alg ..."
Abstract

Cited by 164 (13 self)
 Add to MetaCart
We present a membershipquery algorithm for efficiently learning DNF with respect to the uniform distribution. In fact, the algorithm properly learns with respect to uniform the class TOP of Boolean functions expressed as a majority vote over parity functions. We also describe extensions of this algorithm for learning DNF over certain nonuniform distributions and for learning a class of geometric concepts that generalizes DNF. Furthermore, we show that DNF is weakly learnable with respect to uniform from noisy examples. Our strong learning algorithm utilizes one of Freund's boosting techniques and relies on the fact that boosting does not require a completely distributionindependent weak learner. The boosted weak learner is a nonuniform extension of a parityfinding algorithm discovered by Goldreich and Levin. 3 1 Introduction Consider the following 20questionslike game between two players, Bob and Alice. Bob has a Disjunctive Normal Form (DNF) expression f in mind. Alice is allo...
Numbertheoretic constructions of efficient pseudorandom functions
 In 38th Annual Symposium on Foundations of Computer Science
, 1997
"... ..."
Weakly Learning DNF and Characterizing Statistical Query Learning Using Fourier Analysis
 IN PROCEEDINGS OF THE TWENTYSIXTH ANNUAL SYMPOSIUM ON THEORY OF COMPUTING
, 1994
"... We present new results on the wellstudied problem of learning DNF expressions. We prove that an algorithm due to Kushilevitz and Mansour [13] can be used to weakly learn DNF formulas with membership queries with respect to the uniform distribution. This is the rst positive result known for learn ..."
Abstract

Cited by 123 (23 self)
 Add to MetaCart
We present new results on the wellstudied problem of learning DNF expressions. We prove that an algorithm due to Kushilevitz and Mansour [13] can be used to weakly learn DNF formulas with membership queries with respect to the uniform distribution. This is the rst positive result known for learning general DNF in polynomial time in a nontrivial model. Our results should be contrasted with those of Kharitonov [12], who proved that AC 0 is not eciently learnable in this model based on cryptographic assumptions. We also present ecient learning algorithms in various models for the readk and SATk subclasses of DNF. We then turn our attention to the recently introduced statistical query model of learning [9]. This model is a restricted version of the popular Probably Approximately Correct (PAC) model, and practically every PAC learning algorithm falls into the statistical query model [9]. We prove that DNF and decision trees are not even weakly learnable in polynomial time in this model. This result is informationtheoretic and therefore does not rely on any unproven assumptions, and demonstrates that no straightforward modication of the existing algorithms for learning various restricted forms of DNF and decision trees will solve the general problem. These lower bounds are a corollary of a more general characterization of the complexity of statistical query learning in terms of the number of uncorrelated functions in the concept class. The underlying tool for all of our results is the Fourier analysis of the concept class to be learned.
The Expressive Power of Voting Polynomials
 Combinatorica
, 1993
"... We consider the problem of approximating a Boolean function f : f0; 1g n ! f0; 1g by the sign of an integer polynomial p of degree k. For us, a polynomial p(x) predicts the value of f(x) if, whenever p(x) 0, f(x) = 1, and whenever p(x) ! 0, f(x) = 0. A lowdegree polynomial p is a good approxima ..."
Abstract

Cited by 91 (9 self)
 Add to MetaCart
We consider the problem of approximating a Boolean function f : f0; 1g n ! f0; 1g by the sign of an integer polynomial p of degree k. For us, a polynomial p(x) predicts the value of f(x) if, whenever p(x) 0, f(x) = 1, and whenever p(x) ! 0, f(x) = 0. A lowdegree polynomial p is a good approximator for f if it predicts f at almost all points. Given a positive integer k, and a Boolean function f , we ask, "how good is the best degree k approximation to f?" We introduce a new lower bound technique which applies to any Boolean function. We show that the lower bound technique yields tight bounds in the case f is parity. Minsky and Papert [10] proved that a perceptron can not compute parity; our bounds indicate exactly how well Yale University, Dept. of Computer Science, P.O. Box 208285, New Haven CT 065208285. y Email: aspnesjames@cs.yale.edu. z Email: beigelrichard@cs.yale.edu. Supported in part by NSF grants CCR8808949 and CCR8958528. x CarnegieMellon University, Schoo...
Scaling learning algorithms towards AI
, 2007
"... One longterm goal of machine learning research is to produce methods that are applicable to highly complex tasks, such as perception (vision, audition), reasoning, intelligent control, and other artificially intelligent behaviors. We argue that in order to progress toward this goal, the Machine Le ..."
Abstract

Cited by 85 (20 self)
 Add to MetaCart
One longterm goal of machine learning research is to produce methods that are applicable to highly complex tasks, such as perception (vision, audition), reasoning, intelligent control, and other artificially intelligent behaviors. We argue that in order to progress toward this goal, the Machine Learning community must endeavor to discover algorithms that can learn highly complex functions, with minimal need for prior knowledge, and with minimal human intervention. We present mathematical and empirical evidence suggesting that many popular approaches to nonparametric learning, particularly kernel methods, are fundamentally limited in their ability to learn complex highdimensional functions. Our analysis focuses on two problems. First, kernel machines are shallow architectures, in which one large layer of simple template matchers is followed by a single layer of trainable coefficients. We argue that shallow architectures can be very inefficient in terms of required number of computational elements and examples. Second, we analyze a limitation of kernel machines with a local kernel, linked to the curse of dimensionality, that applies to supervised, unsupervised (manifold learning) and semisupervised kernel machines. Using empirical results on invariant image recognition tasks, kernel methods are compared with deep architectures, in which lowerlevel features or concepts are progressively combined into more abstract and higherlevel representations. We argue that deep architectures have the potential to generalize in nonlocal ways, i.e., beyond immediate neighbors, and that this is crucial in order to make progress on the kind of complex tasks required for artificial intelligence. 1 1
E±cient cryptographic schemes provably as secure as subset sum
 Proc. 30th IEEE Symposium on Foundations of Computer Science
, 1989
"... ..."