Results 1 - 10
of
17
Learning juntas
- In Proc. 35th Ann. ACM Symp. on the Theory of Computing
, 2003
"... We consider a fundamental problem in computational learning theory: learning an arbitrary Boolean function which depends on an unknown set of k out of n Boolean variables. We give an algorithm for learning such functions from uniform random exam-ples which runs in time roughly (n k) ω ω+1, where ω < ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
We consider a fundamental problem in computational learning theory: learning an arbitrary Boolean function which depends on an unknown set of k out of n Boolean variables. We give an algorithm for learning such functions from uniform random exam-ples which runs in time roughly (n k) ω ω+1, where ω < 2.376 is the matrix multiplication exponent. We thus obtain the first polynomial factor improvement on the naive n k time bound which can be achieved via exhaustive search. Our algorithm and analysis exploit new structural properties of Boolean functions.
Learning active classifiers
- Proceedings of the Thirteenth International Conference on Machine Learning (ICML96
, 1996
"... Most classification algorithms are "passive", in that they assign a class-label to each instance based only on the description given, even if that description is incomplete. By contrast, an active classifier can -- at some cost -- obtain the values of missing attributes, before deciding upon a class ..."
Abstract
-
Cited by 17 (5 self)
- Add to MetaCart
Most classification algorithms are "passive", in that they assign a class-label to each instance based only on the description given, even if that description is incomplete. By contrast, an active classifier can -- at some cost -- obtain the values of missing attributes, before deciding upon a class label. This can be useful when considering, for example, whether to extract some information from the web for a critical decision or whether to gather information for a medical test or experiment. The expected utility of using an active classifier depends on both the cost required to obtain the additional attribute values and the penalty incurred if the classifier outputs the wrong classification. This paper analyzes the problem of learning optimal active classifiers, using a variant of the probably-approximately-correct (PAC) model. After defining the framework, we show that this task can be achieved efficiently when the active classifier is allowed to perform only (at most) a constant number of tests. We then show that, in more general environments, the task is often intractable.
On the Fourier spectrum of symmetric Boolean functions with applications to learning symmetric juntas
- In Proceedings of 20th IEEE Conference on Computational Complexity
, 2005
"... We study the following question: What is the smallest t such that every symmetric boolean function on k variables (which is not a constant or a parity function), has a non-zero Fourier coefficient of order at least 1 and at most t? We exclude the constant functions for which there is no such t and t ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
We study the following question: What is the smallest t such that every symmetric boolean function on k variables (which is not a constant or a parity function), has a non-zero Fourier coefficient of order at least 1 and at most t? We exclude the constant functions for which there is no such t and the parity functions for which t has to be k. Let τ(k) be the smallest such t. The main contribution of this paper is a proof of the following self similar nature of this question: If τ(l) ≤ s, then for any ɛ> 0 and � for k ≥ k0(l, ɛ), τ(k) ≤ k s+1 l+1 + ɛ Coupling this result with a computer based search which establishes τ(30) = 2, one obtains that for large enough k, τ(k) ≤ 3k/31. The motivation for our work is to understand the complexity of learning symmetric juntas. A k-junta is a boolean function of n variables that depends only on an unknown subset of k variables. If f is symmetric in the variables it depends on, it is called a symmetric k-junta. Our results imply an algorithm to learn the class of symmetric k-juntas, in the uniform PAC learning model, in time approximately n 3k 31. This improves on a result of Mossel, O’Donnell and Servedio in [11], who show that symmetric k-juntas can be ∗ Research supported by NSF grants CCR-0002299 and CCF-0431023.
Improved bounds for testing juntas
- In Proc. 12th Workshop RANDOM
, 2008
"... Abstract. We consider the problem of testing functions for the property of being a k-junta (i.e., of depending on at most k variables). Fischer, Kindler, Ron, Safra, and Samorodnitsky (J. Comput. Sys. Sci., 2004) showed that Õ(k2)/ɛ queries are sufficient to test k-juntas, and conjectured that this ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Abstract. We consider the problem of testing functions for the property of being a k-junta (i.e., of depending on at most k variables). Fischer, Kindler, Ron, Safra, and Samorodnitsky (J. Comput. Sys. Sci., 2004) showed that Õ(k2)/ɛ queries are sufficient to test k-juntas, and conjectured that this bound is optimal for non-adaptive testing algorithms. Our main result is a non-adaptive algorithm for testing k-juntas with Õ(k 3/2)/ɛ queries. This algorithm disproves the conjecture of Fischer et al. We also show that the query complexity of non-adaptive algorithms for testing juntas has a lower bound of min ` ˜ Ω(k/ɛ), 2
Sharper bounds for the hardness of prototype and feature selection
- PROC. OF THE 11TH INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, IN: LECTURE NOTES IN ARTIFICIAL INTELLIGENCE
, 2000
"... As pointed out by Blum [Blu94], ”nearly all results in Machine Learning [...] deal with problems of separating relevant from irrelevant information in some way”. This paper is concerned with structural complexity issues regarding the selection of relevant Prototypes or Features. We give the first re ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
As pointed out by Blum [Blu94], ”nearly all results in Machine Learning [...] deal with problems of separating relevant from irrelevant information in some way”. This paper is concerned with structural complexity issues regarding the selection of relevant Prototypes or Features. We give the first results proving that both problems can be much harder than expected in the literature for various notions of relevance. In particular, the worst-case bounds achievable by any efficient algorithm are proven to be very large, most of the time not so far from trivial bounds. We think these results give a theoretical justification for the numerous heuristic approaches found in the literature to cope with these problems.
Public Key Cryptography from Different Assumptions
, 2008
"... We construct a new public key encryption based on two assumptions: 1. One can obtain a pseudorandom generator with small locality by connecting the outputs to the inputs using any sufficiently good unbalanced expander. 2. It is hard to distinguish between a random graph that is such an expander and ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We construct a new public key encryption based on two assumptions: 1. One can obtain a pseudorandom generator with small locality by connecting the outputs to the inputs using any sufficiently good unbalanced expander. 2. It is hard to distinguish between a random graph that is such an expander and a random graph where a (planted) random logarithmic-sized subset S of the outputs is connected to fewer than |S | inputs. The validity and strength of the assumptions raise interesting new algorithmic and pseudorandomness questions, and we explore their relation to the current state-of-art. 1
On Agnostic Learning of Parities, Monomials and Halfspaces
, 2006
"... We study the learnability of several fundamental concept classes in the agnostic learning framework of Haussler [Hau92] and Kearns et al. [KSS94]. We show that under the uniform distribution, agnostically learning parities reduces to learning parities with random classification noise, commonly refer ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We study the learnability of several fundamental concept classes in the agnostic learning framework of Haussler [Hau92] and Kearns et al. [KSS94]. We show that under the uniform distribution, agnostically learning parities reduces to learning parities with random classification noise, commonly referred to as the noisy parity problem. Together with the parity learning algorithm of Blum et al. [BKW03], this gives the first nontrivial algorithm for agnostic learning of parities. We use similar techniques to reduce learning of two other fundamental concept classes under the uniform distribution to learning of noisy parities. Namely, we show that learning of DNF expressions reduces to learning noisy parities of just logarithmic number of variables and learning of k-juntas reduces to learning noisy parities of k variables. We give essentially optimal hardness results for agnostic learning of monomials over {0, 1} n and halfspaces over Q n. We show that for any constant ɛ finding a monomial (halfspace) that agrees with an unknown function on 1/2 + ɛ fraction of examples is NP-hard even when there exists a monomial (halfspace) that agrees with the unknown function on 1 − ɛ fraction of examples. This resolves an open question due to Blum and significantly improves on a number of previous hardness results for these problems. We extend these results to ɛ = 2 − log1−λ n (ɛ = 2 − √ log n in the case of halfspaces) for any constant λ> 0 under stronger complexity assumptions.
Learning functions of k hidden variables
"... We consider a fundamental problem in computational learning theory: learning an arbitrary Boolean function which depends on an unknown set of k out of n Boolean variables. We give an algorithm for learning such functions under the uniform distribution which runs in time roughly (nk)!!+1; where! ! 2: ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We consider a fundamental problem in computational learning theory: learning an arbitrary Boolean function which depends on an unknown set of k out of n Boolean variables. We give an algorithm for learning such functions under the uniform distribution which runs in time roughly (nk)!!+1; where! ! 2:376 is the matrix multiplication exponent. We thus obtain the first polynomial factor improvement on a naive nk time bound which can be achieved via exhaustive search. Our algorithm and analysis exploit new structural properties of Boolean functions.
Learning Convex Concepts from Gaussian Distributions with PCA
"... Abstract—We present a new algorithm for learning a convex set in n-dimensional space given labeled examples drawn from any Gaussian distribution. The complexity of the algorithm is bounded by a fixed polynomial in n times a function of k and ɛ where k is the dimension of the normal subspace (the spa ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract—We present a new algorithm for learning a convex set in n-dimensional space given labeled examples drawn from any Gaussian distribution. The complexity of the algorithm is bounded by a fixed polynomial in n times a function of k and ɛ where k is the dimension of the normal subspace (the span of normal vectors to supporting hyperplanes of the convex set) and the output is a hypothesis that correctly classifies at least 1−ɛ of the unknown Gaussian distribution. For the important case when the convex set is the intersection of k halfspaces, the complexity is poly(n, k, 1/ɛ) + n · min k O(log k/ɛ4)

