## New results for learning noisy parities and halfspaces (2006)

### Download From

IEEE### Download Links

- [www.cc.gatech.edu]
- [www.cc.gatech.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of the 47th Annual Symposium on Foundations of Computer Science (FOCS |

Citations: | 45 - 12 self |

### BibTeX

@INPROCEEDINGS{Feldman06newresults,

author = {Vitaly Feldman and Subhash Khot and Parikshit Gopalan},

title = {New results for learning noisy parities and halfspaces},

booktitle = {In Proceedings of the 47th Annual Symposium on Foundations of Computer Science (FOCS},

year = {2006},

pages = {563--574}

}

### Years of Citing Articles

### OpenURL

### Abstract

We address well-studied problems concerning the learnability of parities and halfspaces in the presence of classification noise. Learning of parities under the uniform distribution with random classification noise, also called the noisy parity problem is a famous open problem in computational learning. We reduce a number of basic problems regarding learning under the uniform distribution to learning of noisy parities. We show that under the uniform distribution, learning parities with adversarial classification noise reduces to learning parities with random classification noise. Together with the parity learning algorithm of Blum et al. [5], this gives the first nontrivial algorithm for learning parities with adversarial noise. We show that learning of DNF expressions reduces to learning noisy parities of just logarithmic number of variables. We show that learning of k-juntas reduces to learning noisy parities of k variables. These reductions work even in the presence of random classification noise in the original DNF or junta. We then consider the problem of learning halfspaces over Qn with adversarial noise or finding a halfspace that maximizes the agreement rate with a given set of examples. We prove an essentially optimal hardness factor of 2 − ɛ, improving the factor of 85 84 − ɛ due to Bshouty and Burroughs [8]. Finally, we show that majorities of halfspaces are hard to PAC-learn using any representation, based on the cryptographic assumption underlying the Ajtai-Dwork cryptosystem.

### Citations

1695 | A Theory of the Learnable
- Valiant
- 1984
(Show Context)
Citation Context ...n statistical distance) to a noisy parity. Learning DNF formulae: Learning of DNF expressions from random examples is another famous open problem formulated in Valiant’s seminal paper on PAC learning =-=[36]-=-. In this problem we are given access to examples of some Boolean function f which are randomly chosen with respect to distribution D, and ɛ > 0. The goal is to find a hypothesis that ɛ-approximates f... |

647 | Some optimal inapproximability results
- Håstad
(Show Context)
Citation Context ... input points, the problem of learning parity is intractable in the proper learning setting where the learner must produce a parity as the hypothesis; this follows from a celebrated result of H˚astad =-=[18]-=-. We are unaware of non-trivial algorithms for this problem under any fixed distribution, prior to our work. The problem of learning parity with adversarial noise under the uniform distribution is rel... |

596 |
An Introduction to Computational Learning Theory
- Kearns, Vazirani
- 1994
(Show Context)
Citation Context ...ne, and deciding which halfspace to label ’+’, one can get a 50% success rate. An algorithm with a non-trivial guarantee would imply a PAC-learning algorithm for AC 0 circuits in quasipolynomial time =-=[25]-=-. In contrast, at present the best PAC-learning algorithm even for DNFs runs in time 2 Õ(n1/3 ) [29]. Blum et al. [4] observe this connection, and state the question of learning halfspaces with advers... |

424 | Selection of relevant features and examples in machine learning
- Blum, Langley
- 1997
(Show Context)
Citation Context ...2 ɛ · T (n, log B, B)3 ), where B = Õ(s/ɛ). Learning k-juntas: A Boolean function on n variables is a k-junta if it depends only on k variables out of n. This problem was proposed by Blum and Langley =-=[7]-=-, as a clean formulation of the problem of efficient learning in the presence of irrelevant information. In addition, for k ≤ log n, a k-junta can be expressed as a decision tree or a DNF of size n. T... |

423 | Boosting a weak learning algorithm by majority
- Freund
- 1995
(Show Context)
Citation Context ... his breakthrough result on learning DNF expressions with respect to the uniform distribution gives a way to use an algorithm for locating correlated parities and the boosting algorithm due to Freund =-=[12]-=- to build a DNF learning algorithm. We can adapt Jackson’s approach to our setting. We give an outline of the algorithm and omit the now-standard analysis. We view a probability distribution D as a de... |

372 |
Decision theoretic generalizations of the PAC model for neural net and other learning applications
- Haussler
- 1992
(Show Context)
Citation Context ...er. In the adversarial noise model, an adversary is allowed to flip the labels of an η fraction of the input points. Alternatively, one can view this as learning in the agnostic framework of Haussler =-=[19]-=- and Kearns et al. [23] where we do not make any assumptions about how the data is generated, our goal is to find the best hypothesis from a certain class. 1.1 Learning Noisy Parities Under the Unifor... |

366 | A hard-core predicate for all one-way functions
- Goldreich, Levin
- 1989
(Show Context)
Citation Context ...s related to the problem of decoding Hadamard codes. If the learner is allowed to ask membership queries, a celebrated result of Goldreich and Levin gives a polynomial time algorithm for this problem =-=[15]-=-. Later algorithms were given by Kushilevitz and Mansour [33] and Levin [34]. The problem of learning parity in the presence of random noise, or the noisy parity problem is a notorious open problem in... |

306 | Cryptographic limitations on learning boolean formulae and finite automata
- Kearns, Valiant
- 1994
(Show Context)
Citation Context ...l Query model [32]. Based on certain cryptographic assumptions, Kearns and Valiant showed that constant depth threshold circuits cannot be learned over a certain distribution using any representation =-=[24]-=-. Kharitonov strengthened this result by allowing membership queries, and using the uniform distribution [26]. We obtain a hardness result for threshold circuits of depth 2 independent of the hypothes... |

207 | A public-key cryptosystem with worst-case/average-case equivalence
- Ajtai, Dwork
- 1997
(Show Context)
Citation Context ...6]. We obtain a hardness result for threshold circuits of depth 2 independent of the hypothesis representation, based on the cryptographic assumption used in the AjtaiDwork lattice-based cryptosystem =-=[1]-=-. Theorem 5 Assuming the security of the Ajtai-Dwork cryptosystem, there is no weak PAC-learning algorithm for the concept class of (unweighted) Threshold circuits of depth 2. To our knowledge, this i... |

194 | Toward efficient agnostic learning
- Kearns, Schapire, et al.
- 1994
(Show Context)
Citation Context ...noise model, an adversary is allowed to flip the labels of an η fraction of the input points. Alternatively, one can view this as learning in the agnostic framework of Haussler [19] and Kearns et al. =-=[23]-=- where we do not make any assumptions about how the data is generated, our goal is to find the best hypothesis from a certain class. 1.1 Learning Noisy Parities Under the Uniform Distribution A parity... |

185 | Learning decision trees using the Fourier spectrum
- Kushilevitz, Mansour
- 2005
(Show Context)
Citation Context ...earner is allowed to ask membership queries, a celebrated result of Goldreich and Levin gives a polynomial time algorithm for this problem [15]. Later algorithms were given by Kushilevitz and Mansour =-=[33]-=- and Levin [34]. The problem of learning parity in the presence of random noise, or the noisy parity problem is a notorious open problem in computational learning theory. Hereafter, by the noisy parit... |

167 | Learning in the presence of malicious errors
- Kearns, Li
- 1993
(Show Context)
Citation Context ...is work, we study various kinds of classification noise, where the noise only affects the label of a data point. This is different from the model of malicious errors defined by Valiant [37] (see also =-=[22]-=-) where the noise can affect both the label and the point itself, and thus possibly change the distribution of the data-points. Two natural models for classification noise are random noise and adversa... |

165 | An efficient membership-query algorithm for learning DNF with respect to the uniform distribution
- Jackson
- 1997
(Show Context)
Citation Context ..., where DNF-size(f) is the number of terms in the smallest DNF formula for f. The best known algorithm for learning DNFs with respect to the uniform distribution runs in time O(n log (s/ɛ) ). Jackson =-=[20]-=- proved that DNFs are learnable under the uniform distribution in polynomial time if the learning algorithm is allowed to make membership queries. We prove that learning DNFs reduces to learning parit... |

154 | The hardness of approximate optima in lattices, codes, and systems of linear equations
- Arora, Babai, et al.
- 1993
(Show Context)
Citation Context ...rrectly classifies all the data points and our goal is to find one that does as well as possible. Bshouty and Burroughs [8] show a hardness factor of 85 84 for this problem. Furthermore, Arora et al. =-=[3]-=- showed that for any constant C, the problem of minimizing the number of points that are wrongly classified by the halfspace is hard to approximate within factor C. The problem we resolve is the follo... |

129 |
Threshold circuits of bounded depth
- Hajnal, Maass, et al.
- 1993
(Show Context)
Citation Context ... for learning intersections of k-halfspaces typically have running time exponential in k, our result show that this is unavoidable [6, 38, 28]. Finally, using the Discriminator Lemma of Hajnal et al. =-=[17]-=-, we show that Theorem 5 implies the hardness of learning halfspaces with adversarial noise of high rate even when the learning algorithm is allowed to output a hypothesis of its choice. Theorem 6 Ass... |

118 |
Learning disjunction of conjunctions
- Valiant
- 1985
(Show Context)
Citation Context ...is added. In this work, we study various kinds of classification noise, where the noise only affects the label of a data point. This is different from the model of malicious errors defined by Valiant =-=[37]-=- (see also [22]) where the noise can affect both the label and the point itself, and thus possibly change the distribution of the data-points. Two natural models for classification noise are random no... |

117 | Noise-tolerant learning, the parity problem, and the statistical query model
- Blum, Kalai, et al.
(Show Context)
Citation Context ...uniform distribution, learning parities with adversarial classification noise reduces to learning parities with random classification noise. Together with the parity learning algorithm of Blum et al. =-=[5]-=-, this gives the first nontrivial algorithm for learning parities with adversarial noise. We show that learning of DNF expressions reduces to learning noisy parities of just logarithmic number of vari... |

67 | Learning intersections and thresholds of halfspaces
- Klivans, O’Donnell, et al.
(Show Context)
Citation Context ...obtained independently by Klivans and Sherstov [31]. Known algorithms for learning intersections of k-halfspaces typically have running time exponential in k, our result show that this is unavoidable =-=[6, 38, 28]-=-. Finally, using the Discriminator Lemma of Hajnal et al. [17], we show that Theorem 5 implies the hardness of learning halfspaces with adversarial noise of high rate even when the learning algorithm ... |

67 |
The Random Projection Method
- Vempala
- 2004
(Show Context)
Citation Context ...sses: a convex polytope is an intersection of halfspaces in R n , whereas a DNF is a union of halfspaces over {0, 1} n .sThere are numerous negative results known for proper learning of such concepts =-=[38, 2]-=-, and for learning in the Statistical Query model [32]. Based on certain cryptographic assumptions, Kearns and Valiant showed that constant depth threshold circuits cannot be learned over a certain di... |

62 | A polynomialtime algorithm for learning noisy linear threshold functions
- Blum, Frieze, et al.
- 1998
(Show Context)
Citation Context ... problems in machine learning. If such a halfspace does exist, one can find it in polynomial time by Linear Programming. Halfspaces are PAC-learnable even in the presence of random noise: Blum et al. =-=[4]-=- show that a variant of the Perceptron algorithm can be used in this setting. In the adversarial noise scenario, there is no halfspace that correctly classifies all the data points and our goal is to ... |

62 |
Agnostically learning halfspaces
- Kalai, Klivans, et al.
(Show Context)
Citation Context ...can extend our hardness result for learning halfspaces to more general concept classes. One possible generalization would be to allow the sign of a low-degree polynomial as a hypothesis. Kalai et al. =-=[21]-=- use this hypothesis class to design algorithms for agnostic learning of halfspaces under some natural distributions. Similarly, for the problem of learning parity with adversarial noise, one could al... |

60 | inapproximability results for maxclique, chromatic number and min-3lin-deletion
- KHOT, PONNUSWAMI
(Show Context)
Citation Context ...tion satisfying even an ɛ fraction. We then reduce this problem to the halfspace problem. Similar ideas were used by Khot and Ponnuswami for equations over Z2 in order to show hardness for Max-Clique =-=[27]-=-; though there are some further technical difficulties in working over the reals. This hardness result was proved independently by Guruswami and Raghavendra [16]. Their result holds even if the data p... |

50 |
An improved boosting algorithm and its implications on learning complexity
- Freund
- 1992
(Show Context)
Citation Context ...rity algorithm [12] is used to produce distribution functions bounded by O(ɛ −(2+ρ) ) (for arbitrarily small constant ρ). Recently, Klivans and Servedio observed [30] that a later algorithm by Freund =-=[13]-=- produces distribution functions bounded by Õ(ɛ). Combining these results we get the proof of Theorem 2. 3.3 Learning Juntas For the class of k-juntas, we can get a simpler reduction with better param... |

29 | On using extended statistical queries to avoid membership queries
- Bshouty, Feldman
- 2002
(Show Context)
Citation Context ...hm and omit the now-standard analysis. We view a probability distribution D as a density function and define its L∞ norm. Jackson’s algorithm is based on the following Lemma (we use a refinement from =-=[9]-=-). Lemma 3 ([9](Lemma 18)) For any Boolean function f of DNF-size s and any distribution D over {0, 1} n there exists a parity function χa such that |ED[fχa]| ≥ 1 2s+1 andweight(a) ≤ log ((2s + 1)L∞(2... |

29 | Hardness of learning halfspaces with noise
- Guruswami, Raghavendra
- 2006
(Show Context)
Citation Context ... order to show hardness for Max-Clique [27]; though there are some further technical difficulties in working over the reals. This hardness result was proved independently by Guruswami and Raghavendra =-=[16]-=-. Their result holds even if the data points are restricted to lie in {0, 1} n . 1.2.1 Learning Thresholds of Halfspaces As opposed to proper-learning results, one could hope to show that a certain co... |

28 | Learning juntas
- Mossel, O’Donnell, et al.
- 2003
(Show Context)
Citation Context ...Thus, learning juntassis a first step towards learning polynomial size decision trees and DNFs under the uniform distribution. The first non-trivial algorithm was given only recently by Mossel et al. =-=[35]-=-, and runs in time roughly O(n 0.7k ). However, even the question of whether one can learn k-juntas in polynomial time for k = ω(1) still remains open. For the problem of learning k-juntas, we give a ... |

26 |
Learnability and automatizability
- Alekhnovich, Braverman, et al.
- 2004
(Show Context)
Citation Context ...sses: a convex polytope is an intersection of halfspaces in R n , whereas a DNF is a union of halfspaces over {0, 1} n .sThere are numerous negative results known for proper learning of such concepts =-=[38, 2]-=-, and for learning in the Statistical Query model [32]. Based on certain cryptographic assumptions, Kearns and Valiant showed that constant depth threshold circuits cannot be learned over a certain di... |

24 |
Cryptographic lower bounds for learnability of boolean functions on the uniform distribution
- Kharitonov
- 1995
(Show Context)
Citation Context ... threshold circuits cannot be learned over a certain distribution using any representation [24]. Kharitonov strengthened this result by allowing membership queries, and using the uniform distribution =-=[26]-=-. We obtain a hardness result for threshold circuits of depth 2 independent of the hypothesis representation, based on the cryptographic assumption used in the AjtaiDwork lattice-based cryptosystem [1... |

21 |
Randomness and non-determinism
- Levin
- 1993
(Show Context)
Citation Context ...ed to ask membership queries, a celebrated result of Goldreich and Levin gives a polynomial time algorithm for this problem [15]. Later algorithms were given by Kushilevitz and Mansour [33] and Levin =-=[34]-=-. The problem of learning parity in the presence of random noise, or the noisy parity problem is a notorious open problem in computational learning theory. Hereafter, by the noisy parity problem, we w... |

20 | Eliminating decryption errors in the ajtai-dwork cryptosystem
- Goldreich, Goldwasser, et al.
- 1997
(Show Context)
Citation Context ... any kind. This result follows the general outline for proving inherent unpredictability of [24]. We show that the decryption of a modification of the Ajtai-Dwork cryptosystem [1] by Goldreich et al. =-=[14]-=- can be done by a depth-2 threshold circuit. This result was obtained independently by Klivans and Sherstov [31]. Known algorithms for learning intersections of k-halfspaces typically have running tim... |

17 | Learning an intersection of a constant number of halfspaces under a uniform distribution
- Blum, Kannan
- 1997
(Show Context)
Citation Context ...obtained independently by Klivans and Sherstov [31]. Known algorithms for learning intersections of k-halfspaces typically have running time exponential in k, our result show that this is unavoidable =-=[6, 38, 28]-=-. Finally, using the Discriminator Lemma of Hajnal et al. [17], we show that Theorem 5 implies the hardness of learning halfspaces with adversarial noise of high rate even when the learning algorithm ... |

17 |
Boosting and hard-core set construction
- Klivans, Servedio
- 2003
(Show Context)
Citation Context ...’s result [20], Freund’s boost-by-majority algorithm [12] is used to produce distribution functions bounded by O(ɛ −(2+ρ) ) (for arbitrarily small constant ρ). Recently, Klivans and Servedio observed =-=[30]-=- that a later algorithm by Freund [13] produces distribution functions bounded by Õ(ɛ). Combining these results we get the proof of Theorem 2. 3.3 Learning Juntas For the class of k-juntas, we can get... |

12 |
Improved lower bounds for learning intersections of halfspaces
- Klivans, Sherstov
- 2006
(Show Context)
Citation Context ...in R n , whereas a DNF is a union of halfspaces over {0, 1} n .sThere are numerous negative results known for proper learning of such concepts [38, 2], and for learning in the Statistical Query model =-=[32]-=-. Based on certain cryptographic assumptions, Kearns and Valiant showed that constant depth threshold circuits cannot be learned over a certain distribution using any representation [24]. Kharitonov s... |

10 |
Maximizing agreements and coagnostic learning
- Bshouty, Burroughs
- 2006
(Show Context)
Citation Context ...ng a halfspace that maximizes the agreement rate with a given set of examples. We prove an essentially optimal hardness factor of 2 − ɛ, improving the factor of 85 84 − ɛ due to Bshouty and Burroughs =-=[8]-=-. Finally, we show that majorities of halfspaces are hard to PAC-learn using any representation, based on the cryptographic assumption underlying the AjtaiDwork cryptosystem. ∗ Supported by grants fro... |

7 |
Attribute Efficient and Non-adaptive Learning of Parities and DNF Expressions
- Feldman
- 2007
(Show Context)
Citation Context ...le since Freund’s boosting algorithms do not necessarily work in the presence of noise, in particular Jackson’s original algorithm does not handle noisy DNFs. Nevertheless, using ideas due to Feldman =-=[10]-=-, we give a suitable generalization of WP-R that can handle this case. The details can be found in the full version [11]. 4 Hardness of Learning a Halfspace with Adversarial Noise The following is the... |

1 |
Learning dnf in time 2 n1/3
- Klivans, Servedio
- 2001
(Show Context)
Citation Context ...on-trivial guarantee would imply a PAC-learning algorithm for AC 0 circuits in quasipolynomial time [25]. In contrast, at present the best PAC-learning algorithm even for DNFs runs in time 2 Õ(n1/3 ) =-=[29]-=-. Blum et al. [4] observe this connection, and state the question of learning halfspaces with adversarial noise as an important open problem. We prove a negative result which is essentially optimal. W... |

1 | Cryptographic hardness results for learning intersections of halfspaces
- Klivans, Sherstov
- 2006
(Show Context)
Citation Context ...he decryption of a modification of the Ajtai-Dwork cryptosystem [1] by Goldreich et al. [14] can be done by a depth-2 threshold circuit. This result was obtained independently by Klivans and Sherstov =-=[31]-=-. Known algorithms for learning intersections of k-halfspaces typically have running time exponential in k, our result show that this is unavoidable [6, 38, 28]. Finally, using the Discriminator Lemma... |