Results 1  10
of
31
Learning juntas
 In Proc. 35th Ann. ACM Symp. on the Theory of Computing
, 2003
"... We consider a fundamental problem in computational learning theory: learning an arbitrary Boolean function which depends on an unknown set of k out of n Boolean variables. We give an algorithm for learning such functions from uniform random examples which runs in time roughly (n k) ω ω+1, where ω & ..."
Abstract

Cited by 36 (2 self)
 Add to MetaCart
(Show Context)
We consider a fundamental problem in computational learning theory: learning an arbitrary Boolean function which depends on an unknown set of k out of n Boolean variables. We give an algorithm for learning such functions from uniform random examples which runs in time roughly (n k) ω ω+1, where ω < 2.376 is the matrix multiplication exponent. We thus obtain the first polynomial factor improvement on the naive n k time bound which can be achieved via exhaustive search. Our algorithm and analysis exploit new structural properties of Boolean functions.
Public Key Cryptography from Different Assumptions
, 2008
"... We construct a new public key encryption based on two assumptions: 1. One can obtain a pseudorandom generator with small locality by connecting the outputs to the inputs using any sufficiently good unbalanced expander. 2. It is hard to distinguish between a random graph that is such an expander and ..."
Abstract

Cited by 22 (4 self)
 Add to MetaCart
(Show Context)
We construct a new public key encryption based on two assumptions: 1. One can obtain a pseudorandom generator with small locality by connecting the outputs to the inputs using any sufficiently good unbalanced expander. 2. It is hard to distinguish between a random graph that is such an expander and a random graph where a (planted) random logarithmicsized subset S of the outputs is connected to fewer than S  inputs. The validity and strength of the assumptions raise interesting new algorithmic and pseudorandomness questions, and we explore their relation to the current stateofart. 1
Learning active classifiers
 Proceedings of the Thirteenth International Conference on Machine Learning (ICML96
, 1996
"... Most classification algorithms are "passive", in that they assign a classlabel to each instance based only on the description given, even if that description is incomplete. By contrast, an active classifier can  at some cost  obtain the values of missing attributes, before deciding up ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
(Show Context)
Most classification algorithms are "passive", in that they assign a classlabel to each instance based only on the description given, even if that description is incomplete. By contrast, an active classifier can  at some cost  obtain the values of missing attributes, before deciding upon a class label. This can be useful when considering, for example, whether to extract some information from the web for a critical decision or whether to gather information for a medical test or experiment. The expected utility of using an active classifier depends on both the cost required to obtain the additional attribute values and the penalty incurred if the classifier outputs the wrong classification. This paper analyzes the problem of learning optimal active classifiers, using a variant of the probablyapproximatelycorrect (PAC) model. After defining the framework, we show that this task can be achieved efficiently when the active classifier is allowed to perform only (at most) a constant number of tests. We then show that, in more general environments, the task is often intractable.
On Agnostic Learning of Parities, Monomials and Halfspaces
, 2006
"... We study the learnability of several fundamental concept classes in the agnostic learning framework of Haussler [Hau92] and Kearns et al. [KSS94]. We show that under the uniform distribution, agnostically learning parities reduces to learning parities with random classification noise, commonly refer ..."
Abstract

Cited by 18 (7 self)
 Add to MetaCart
We study the learnability of several fundamental concept classes in the agnostic learning framework of Haussler [Hau92] and Kearns et al. [KSS94]. We show that under the uniform distribution, agnostically learning parities reduces to learning parities with random classification noise, commonly referred to as the noisy parity problem. Together with the parity learning algorithm of Blum et al. [BKW03], this gives the first nontrivial algorithm for agnostic learning of parities. We use similar techniques to reduce learning of two other fundamental concept classes under the uniform distribution to learning of noisy parities. Namely, we show that learning of DNF expressions reduces to learning noisy parities of just logarithmic number of variables and learning of kjuntas reduces to learning noisy parities of k variables. We give essentially optimal hardness results for agnostic learning of monomials over {0, 1} n and halfspaces over Q n. We show that for any constant ɛ finding a monomial (halfspace) that agrees with an unknown function on 1/2 + ɛ fraction of examples is NPhard even when there exists a monomial (halfspace) that agrees with the unknown function on 1 − ɛ fraction of examples. This resolves an open question due to Blum and significantly improves on a number of previous hardness results for these problems. We extend these results to ɛ = 2 − log1−λ n (ɛ = 2 − √ log n in the case of halfspaces) for any constant λ> 0 under stronger complexity assumptions.
On the Fourier spectrum of symmetric Boolean functions with applications to learning symmetric juntas
 In Proceedings of 20th IEEE Conference on Computational Complexity
, 2005
"... We study the following question: What is the smallest t such that every symmetric boolean function on k variables (which is not a constant or a parity function), has a nonzero Fourier coefficient of order at least 1 and at most t? We exclude the constant functions for which there is no such t and t ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
(Show Context)
We study the following question: What is the smallest t such that every symmetric boolean function on k variables (which is not a constant or a parity function), has a nonzero Fourier coefficient of order at least 1 and at most t? We exclude the constant functions for which there is no such t and the parity functions for which t has to be k. Let τ(k) be the smallest such t. The main contribution of this paper is a proof of the following self similar nature of this question: If τ(l) ≤ s, then for any ɛ> 0 and � for k ≥ k0(l, ɛ), τ(k) ≤ k s+1 l+1 + ɛ Coupling this result with a computer based search which establishes τ(30) = 2, one obtains that for large enough k, τ(k) ≤ 3k/31. The motivation for our work is to understand the complexity of learning symmetric juntas. A kjunta is a boolean function of n variables that depends only on an unknown subset of k variables. If f is symmetric in the variables it depends on, it is called a symmetric kjunta. Our results imply an algorithm to learn the class of symmetric kjuntas, in the uniform PAC learning model, in time approximately n 3k 31. This improves on a result of Mossel, O’Donnell and Servedio in [11], who show that symmetric kjuntas can be ∗ Research supported by NSF grants CCR0002299 and CCF0431023.
Improved bounds for testing juntas
 In Proc. 12th Workshop RANDOM
, 2008
"... Abstract. We consider the problem of testing functions for the property of being a kjunta (i.e., of depending on at most k variables). Fischer, Kindler, Ron, Safra, and Samorodnitsky (J. Comput. Sys. Sci., 2004) showed that Õ(k2)/ɛ queries are sufficient to test kjuntas, and conjectured that this ..."
Abstract

Cited by 13 (5 self)
 Add to MetaCart
(Show Context)
Abstract. We consider the problem of testing functions for the property of being a kjunta (i.e., of depending on at most k variables). Fischer, Kindler, Ron, Safra, and Samorodnitsky (J. Comput. Sys. Sci., 2004) showed that Õ(k2)/ɛ queries are sufficient to test kjuntas, and conjectured that this bound is optimal for nonadaptive testing algorithms. Our main result is a nonadaptive algorithm for testing kjuntas with Õ(k 3/2)/ɛ queries. This algorithm disproves the conjecture of Fischer et al. We also show that the query complexity of nonadaptive algorithms for testing juntas has a lower bound of min ` ˜ Ω(k/ɛ), 2
Learning Convex Concepts from Gaussian Distributions with PCA
"... Abstract—We present a new algorithm for learning a convex set in ndimensional space given labeled examples drawn from any Gaussian distribution. The complexity of the algorithm is bounded by a fixed polynomial in n times a function of k and ɛ where k is the dimension of the normal subspace (the spa ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
Abstract—We present a new algorithm for learning a convex set in ndimensional space given labeled examples drawn from any Gaussian distribution. The complexity of the algorithm is bounded by a fixed polynomial in n times a function of k and ɛ where k is the dimension of the normal subspace (the span of normal vectors to supporting hyperplanes of the convex set) and the output is a hypothesis that correctly classifies at least 1−ɛ of the unknown Gaussian distribution. For the important case when the convex set is the intersection of k halfspaces, the complexity is poly(n, k, 1/ɛ) + n · min k O(log k/ɛ4)
Open problem: Learning a function of r relevant variables
 In Proceeding of COLT
, 2003
"... This problem has been around for a while but is one of my favorites. I will state it here in three forms, discuss a number of known results (some easy and some more intricate), and finally end with small financial incentives for various kinds of partial progress. This problem appears in various guis ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
This problem has been around for a while but is one of my favorites. I will state it here in three forms, discuss a number of known results (some easy and some more intricate), and finally end with small financial incentives for various kinds of partial progress. This problem appears in various guises in [BFKL93,Blu94,MOS03]. To begin we need the following standard definition: a boolean function f over {0, 1} n has (at most) r relevant variables if there exist r indices i1,...,ir such that f(x) = g(xi1,...,xir) for some boolean function g over {0, 1} r. In other words, the value of f is determined by only a subset of r of its n input variables. For instance, the function f(x) = x1¯x2 ∨ x2¯x5 ∨ x5¯x1 has three relevant variables. The “class of boolean functions with r relevant variables” is the set of all such functions, over all possible g and sets {i1,..., ir}. The problems are: (a) Does there exist a polynomial time algorithm for learning the class of boolean functions that have lg(n) relevant variables, over the uniform distribution on {0, 1} n?
Sharper bounds for the hardness of prototype and feature selection
 PROC. OF THE 11TH INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, IN: LECTURE NOTES IN ARTIFICIAL INTELLIGENCE
, 2000
"... As pointed out by Blum [Blu94], ”nearly all results in Machine Learning [...] deal with problems of separating relevant from irrelevant information in some way”. This paper is concerned with structural complexity issues regarding the selection of relevant Prototypes or Features. We give the first re ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
As pointed out by Blum [Blu94], ”nearly all results in Machine Learning [...] deal with problems of separating relevant from irrelevant information in some way”. This paper is concerned with structural complexity issues regarding the selection of relevant Prototypes or Features. We give the first results proving that both problems can be much harder than expected in the literature for various notions of relevance. In particular, the worstcase bounds achievable by any efficient algorithm are proven to be very large, most of the time not so far from trivial bounds. We think these results give a theoretical justification for the numerous heuristic approaches found in the literature to cope with these problems.