Results 1 - 10
of
27
Hardness of learning halfspaces with noise
- In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
, 2006
"... Learning an unknown halfspace (also called a perceptron) from labeled examples is one of the classic problems in machine learning. In the noise-free case, when a halfspace consistent with all the training examples exists, the problem can be solved in polynomial time using linear programming. However ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
Learning an unknown halfspace (also called a perceptron) from labeled examples is one of the classic problems in machine learning. In the noise-free case, when a halfspace consistent with all the training examples exists, the problem can be solved in polynomial time using linear programming. However, under the promise that a halfspace consistent with a fraction (1 − ε) of the examples exists (for some small constant ε> 0), it was not known how to efficiently find a halfspace that is correct on even 51 % of the examples. Nor was a hardness result that ruled out getting agreement on more than 99.9 % of the examples known. In this work, we close this gap in our understanding, and prove that even a tiny amount of worst-case noise makes the problem of learning halfspaces intractable in a strong sense. Specifically, for arbitrary ε, δ> 0, we prove that given a set of examples-label pairs from the hypercube a fraction (1 − ε) of which can be explained by a halfspace, it is NP-hard to find a halfspace that correctly labels a fraction (1/2 + δ) of the examples. The hardness result is tight since it is trivial to get agreement on 1/2 the examples. In learning theory parlance, we prove that weak proper agnostic learning of halfspaces is hard. This settles a question that was raised by Blum et al. in their work on learning halfspaces in the presence of random classification noise [10], and in some more recent works as well. Along the way, we also obtain a strong hardness result for another basic computational problem: solving a linear system over the rationals. 1
Cryptographic hardness for learning intersections of halfspaces
- J. Comput. Syst. Sci
"... We give the first representation-independent hardness results for PAC learning intersections of halfspaces, a central concept class in computational learning theory. Our hardness results are derived from two public-key cryptosystems due to Regev, which are based on the worstcase hardness of well-stu ..."
Abstract
-
Cited by 23 (11 self)
- Add to MetaCart
We give the first representation-independent hardness results for PAC learning intersections of halfspaces, a central concept class in computational learning theory. Our hardness results are derived from two public-key cryptosystems due to Regev, which are based on the worstcase hardness of well-studied lattice problems. Specifically, we prove that a polynomialtime algorithm for PAC learning intersections of n ε halfspaces (for a constant ε> 0) in n dimensions would yield a polynomial-time solution to Õ(n 1.5)-uSVP (unique shortest vector problem). We also prove that PAC learning intersections of n ε low-weight halfspaces would yield a polynomial-time quantum solution to Õ(n 1.5)-SVP and Õ(n 1.5)-SIVP (shortest vector problem and shortest independent vector problem, respectively). Our approach also yields the first representation-independent hardness results for learning polynomialsize depth-2 neural networks and polynomial-size depth-3 arithmetic circuits. Key words: Cryptographic hardness results, intersections of halfspaces, computational learning theory, lattice-based cryptography 1
The unbounded-error communication complexity of symmetric functions
- In Proc. of the 49th Symposium on Foundations of Computer Science (FOCS
, 2008
"... We prove an essentially tight lower bound on the unbounded-error communication complexity of every symmetric function, i.e., f (x, y) = D(|x ∧ y|), where D: {0, 1,..., n} → {0, 1} is a given predicate and x, y range over {0, 1} n. Specifically, we show that the communication complexity of f is betw ..."
Abstract
-
Cited by 14 (8 self)
- Add to MetaCart
We prove an essentially tight lower bound on the unbounded-error communication complexity of every symmetric function, i.e., f (x, y) = D(|x ∧ y|), where D: {0, 1,..., n} → {0, 1} is a given predicate and x, y range over {0, 1} n. Specifically, we show that the communication complexity of f is between �(k / log5 n) and �(k log n), where k is the number of value changes of D in {0, 1,..., n}. The unbounded-error model is the most powerful of the basic models of communication (both classical and quantum), and proving lower bounds in it is a considerable challenge. The only previous nontrivial lower bounds for explicit functions in this model appear in the groundbreaking work of Forster (2001) and its extensions. Our proof is built around two novel ideas. First, we show that a given predicate D gives rise to a rapidly mixing random walk on Zn 2, which allows us to reduce the problem to communication lower bounds for “typical” predicates. Second, we use Paturi’s approximation lower bounds (1992), suitably generalized here to clusters of real nodes in [0, n] and interpreted in their dual form, to prove that a typical predicate behaves analogous to PARITY with respect to a smooth distribution on the inputs.
Unconditional lower bounds for learning intersections of halfspaces
- Machine Learning
, 2007
"... We prove new lower bounds for learning intersections of halfspaces, one of the most important concept classes in computational learning theory. Our main result is that any statistical-query algorithm for learning the intersection of √ n halfspaces in n dimensions must make 2 Ω( √ n) queries. This is ..."
Abstract
-
Cited by 14 (12 self)
- Add to MetaCart
We prove new lower bounds for learning intersections of halfspaces, one of the most important concept classes in computational learning theory. Our main result is that any statistical-query algorithm for learning the intersection of √ n halfspaces in n dimensions must make 2 Ω( √ n) queries. This is the first non-trivial lower bound on the statistical query dimension for this concept class (the previous best lower bound was n Ω(logn)). Our lower bound holds even for intersections of low-weight halfspaces. In the latter case, it is nearly tight. We also show that the intersection of two majorities (low-weight halfspaces) cannot be computed by a polynomial threshold function (PTF) with fewer than n Ω(logn/loglogn) monomials. This is the first super-polynomial lower bound on the PTF length of this concept class, and is nearly optimal. For intersections of k = ω(logn) low-weight halfspaces, we improve our lower bound to min{2 Ω( √ n),n Ω(k/logk)}, which too is nearly optimal. As a consequence, intersections of even two halfspaces are not computable by polynomial-weight PTFs, the most expressive class of functions known to be efficiently learnable via Jackson’s Harmonic Sieve algorithm. Finally, we report our progress on the weak learnability of intersections of halfspaces under the uniform distribution. 1
The sign-rank of AC^0
- IN PROC. OF THE 49TH SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS
, 2008
"... The sign-rank of a matrix A = [Ai j] with ±1 entries is the least rank of a real matrix B = [Bi j] with Ai j Bi j> 0 for all i, j. We obtain the first exponential lower bound on the sign-rank of a function in AC 0. Namely, let f (x, y) = �m �m2 i=1 j=1 (xi j ∧ yi j). We show that the matrix [ f (x, ..."
Abstract
-
Cited by 14 (9 self)
- Add to MetaCart
The sign-rank of a matrix A = [Ai j] with ±1 entries is the least rank of a real matrix B = [Bi j] with Ai j Bi j> 0 for all i, j. We obtain the first exponential lower bound on the sign-rank of a function in AC 0. Namely, let f (x, y) = �m �m2 i=1 j=1 (xi j ∧ yi j). We show that the matrix [ f (x, y)]x,y has sign-rank 2�(m). This in particular implies that �cc 2 � ⊆ UPPcc, which solves a long-standing open problem posed by Babai, Frankl, and Simon (1986). Our result additionally implies a lower bound in learning theory. Specifically, let φ1,..., φr: {0, 1} n → R be functions such that every DNF formula f: {0, 1} n → {−1, +1} of polynomial size has the representation f ≡ sign(a1φ1 + · · · + ar φr) for some reals a1,..., ar. We prove that then r � 2�(n1/3) , which essentially matches an upper bound of 2Õ(n1/3) due to Klivans and Servedio (2001). Finally, our work yields the first exponential lower bound on the size of threshold-of-majority circuits computing a function in AC 0. This substantially generalizes and strengthens the results of Krause and Pudlák (1997).
Fast Cryptographic Primitives and Circular-Secure Encryption Based on Hard Learning Problems
"... Abstract. The well-studied task of learning a linear function with errors is a seemingly hard problem and the basis for several cryptographic schemes. Here we demonstrate additional applications that enjoy strong security properties and a high level of efficiency. Namely, we construct: 1. Public-key ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Abstract. The well-studied task of learning a linear function with errors is a seemingly hard problem and the basis for several cryptographic schemes. Here we demonstrate additional applications that enjoy strong security properties and a high level of efficiency. Namely, we construct: 1. Public-key and symmetric-key cryptosystems that provide security for key-dependent messages and enjoy circular security. Our schemes are highly efficient: in both cases the ciphertext is only a constant factor larger than the plaintext, and the cost of encryption and decryption is only n · polylog(n) bit operations per message symbol in the public-key case, and polylog(n) bit operations in the symmetric case. 2. Two efficient pseudorandom objects: a “weak randomized pseudorandom function ” — a relaxation of standard PRF — that can be computed obliviously via a simple protocol, and a length-doubling pseudorandom generator that can be computed by a circuit of n ·
Agnostic Learning of Monomials by Halfspaces is Hard
"... Abstract — We prove the following strong hardness result for learning: Given a distribution on labeled examples from the hypercube such that there exists a monomial (or conjunction) consistent with (1 − ϵ)-fraction of the examples, it is NP-hard to find a halfspace that is correct on ( 1 +ϵ)-fractio ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
Abstract — We prove the following strong hardness result for learning: Given a distribution on labeled examples from the hypercube such that there exists a monomial (or conjunction) consistent with (1 − ϵ)-fraction of the examples, it is NP-hard to find a halfspace that is correct on ( 1 +ϵ)-fraction of the examples, 2 for arbitrary constant ϵ> 0. In learning theory terms, weak agnostic learning of monomials by halfspaces is NP-hard. This hardness result bridges between and subsumes two previous results which showed similar hardness results for the proper learning of monomials and halfspaces. As immediate corollaries of our result, we give the first optimal hardness results for weak agnostic learning of decision lists and majorities. Our techniques are quite different from previous hardness proofs for learning. We use an invariance principle and sparse approximation of halfspaces from recent work on fooling halfspaces to give a new natural list decoding of a halfspace in the context of dictatorship tests/label cover reductions. In addition, unlike previous invariance principle based proofs which are only known to give Unique Games hardness, we give a reduction from a smooth version of Label Cover that is known to be NP-hard.
Hardness of reconstructing multivariate polynomials over finite fields
- In Proc. 48 th IEEE Symp. on Foundations of Computer Science (FOCS’07
, 2007
"... We study the polynomial reconstruction problem for low-degree multivariate polynomials over F[2]. In this problem, we are given a set of points x ∈ {0, 1} n and target values f(x) ∈ {0, 1} for each of these points, with the promise that there is a polynomial over F[2] of degree at most d that agree ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
We study the polynomial reconstruction problem for low-degree multivariate polynomials over F[2]. In this problem, we are given a set of points x ∈ {0, 1} n and target values f(x) ∈ {0, 1} for each of these points, with the promise that there is a polynomial over F[2] of degree at most d that agrees with f at 1−ε fraction of the points. Our goal is to find a degree d polynomial that has good agreement with f. We show that it is NP-hard to find a polynomial that agrees with f on more than 1 − 2 −d + δ fraction of the points for any ɛ, δ> 0. This holds even with the stronger promise that the polynomial that fits the data is in fact linear, whereas the algorithm is allowed to find a polynomial of degree d. Previously the only known hardness of approximation (or even NP-completeness) was for the case when d = 1, which follows from a celebrated result of H˚astad [16]. In the setting of Computational Learning, our result shows the hardness of (non-proper)agnostic learning of parities, where the learner is allowed a low-degree polynomial over F[2] as a hypothesis. This is the first nonproper hardness result for this central problem in computational learning. Our results extend to multivariate polynomial reconstruction over any finite field.
On agnostic boosting and parity learning
- Proceedings of the Symposium on Theory of Computing
, 2008
"... The motivating problem is agnostically learning parity functions, i.e., parity with arbitrary or adversarial noise. Specifically, given random labeled examples from an arbitrary distribution, we would like to produce an hypothesis whose accuracy nearly matches the accuracy of the best parity functio ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
The motivating problem is agnostically learning parity functions, i.e., parity with arbitrary or adversarial noise. Specifically, given random labeled examples from an arbitrary distribution, we would like to produce an hypothesis whose accuracy nearly matches the accuracy of the best parity function. Our algorithm runs in time 2 O(n / log n) , which matches the best known for the easier cases of learning parities with random classification noise (Blum et al, 2003) and for agnostically learning parities over the uniform distribution on inputs (Feldman et al, 2006). Our approach is worth noting. We give an agnostic boosting theorem that is capable of nearly achieving optimal accuracy, improving upon earlier studies (starting with Ben David et al, 2001). This is combined with an algorithm that harnesses an unexpected (very weak) agnostic ability of the (random noise) parity learning algorithm of Blum et al (2000). Our agnostic boosting framework is completely general and may be applied to other agnostic learning problems. Hence, it also sheds light on the actual difficulty of agnostic learning by showing that full agnostic boosting is indeed possible, despite previous lower bounds. 1
Improved Guarantees for Learning via Similarity Functions
"... We continue the investigation of natural conditions for a similarity function to allow learning, without requiring the similarity function to be a valid kernel, or referring to an implicit high-dimensional space. We provide a new notion of a “good similarity function ” that builds upon the previous ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
We continue the investigation of natural conditions for a similarity function to allow learning, without requiring the similarity function to be a valid kernel, or referring to an implicit high-dimensional space. We provide a new notion of a “good similarity function ” that builds upon the previous definition of Balcan and Blum (2006) but improves on it in two important ways. First, as with the previous definition, any large-margin kernel is also a good similarity function in our sense, but the translation now results in a much milder increase in the labeled sample complexity. Second, we prove that for distribution-specific PAC learning, our new notion is strictly more powerful than the traditional notion of a large-margin kernel. In particular, we show that for any hypothesis class C there exists a similarity function under our definition allowing learning with O(log |C|) labeled examples. However, in a lower bound which may be of independent interest, we show that for any class C of pairwise uncorrelated functions, there is no kernel with margin γ ≥ 8 / √ |C | for all f ∈ C, even if one allows average hinge-loss as large as 0.5. Thus, the sample complexity for learning such classes with SVMs is Ω(|C|). This extends work of Ben-David et al. (2003) and Forster and Simon (2006) who give hardness results with comparable margin bounds, but at much lower error rates. Our new notion of similarity relies upon L1 regularized learning, and our separation result is related to a separation result between what is learnable with L1 vs. L2 regularization. 1

