## On the learnability of discrete distributions (1994)

Venue: | In The 25th Annual ACM Symposium on Theory of Computing |

Citations: | 95 - 11 self |

### BibTeX

@INPROCEEDINGS{Kearns94onthe,

author = {Michael Kearns and Ronitt Rubinfeld},

title = {On the learnability of discrete distributions},

booktitle = {In The 25th Annual ACM Symposium on Theory of Computing},

year = {1994},

pages = {273--282},

publisher = {ACM Press}

}

### Years of Citing Articles

### OpenURL

### Abstract

We introduce and investigate a new model of learning probability distributions from independent draws. Our model is inspired by the popular Probably Approximately Correct (PAC) model for learning boolean functions from labeled

### Citations

9359 |
Elements of information theory
- Cover, Thomas
- 1991
(Show Context)
Citation Context ...mation theory literature. One of its nicest properties is that it upper bounds other natural distance measures such as the L1 distance: L1(D; ^D) = X jD[~y] , ^D[~y]j: ~y2f0;1g n Thus it can be shown =-=[8]-=- that we always have q 2ln2 KL(DjjD) ^ L1(D; D): ^ It is also easily veri ed that if D is any distribution over f0; 1g n and U is the uniform distribution, then KL(DjjU) n (since we can always encode ... |

1761 | A theory of learnable
- Valiant
- 1984
(Show Context)
Citation Context ...odel of learning probability distributions from independent draws. Our model is inspired by the popular Probably Approximately Correct (PAC) model for learning boolean functions from labeled examples =-=[24]-=-, in the sense that we emphasize e cient and approximate learning, and we study the learnability of restricted classes of target distributions. The distribution classes we examine are often de ned by ... |

841 |
Estimation of Dependences Based on Empirical Data
- Vapnik
- 1982
(Show Context)
Citation Context ...-neighbor approaches to the unsupervised learning problem often arise in the nonparametric setting. While we obviously cannot do justice to these areas here, the books of Duda and Hart [9] and Vapnik =-=[25]-=- provide excellent overviews and introductions to the pattern recognition work, as well as many pointers for further reading. See also Izenman's recent survey article [16]. Roughly speaking, our work ... |

710 | The strength of weak learnability
- Schapire
- 1990
(Show Context)
Citation Context ...g the learning process. A function class with similar properties in the PAC model (that is, a class that is PAC learnable only if the hypothesis memorizes the training sample) provably does not exist =-=[22]-=-. Thus this construction is of some philosophical interest, since it is the rst demonstration of a natural learning model in which the converse to Occam's Razor | namely, that e cient learning implies... |

668 |
How to construct random functions
- Goldreich, Goldwasser, et al.
- 1986
(Show Context)
Citation Context ...learning with a generator, it is only for the powerful class of all polynomial-size circuit generators that we can prove hardness; the proof relies on the strong properties of pseudo-random functions =-=[12]-=-. 6.1 Hardness of Learning Probabilistic Finite Automata with an Evaluator We de ne a class of distributions PFAn over f0; 1g n generated by probabilistic nite automata. A distribution in PFAn is de n... |

647 |
Learnability and the VapnikChervonenkis Dimension
- Blumer, Ehrenfeucht, et al.
- 1989
(Show Context)
Citation Context ...rithm provided it is given a su ciently large sample. This powerful principle goes by the name Occam's Razor, and it can be veri ed for many learning models, including our distribution learning model =-=[5, 6, 21, 14]-=-. In the distribution-free PAC model, the converse to Occam's Razor can be shown to hold as well [11, 22]. Specifically, ifany class of polynomial-size circuits over f0; 1g n is e ciently learnable in... |

387 |
Decision theoretic generalizations of the PAC model for neural net and learning applications
- Haussler
- 1992
(Show Context)
Citation Context ...e coe cients, that is, qi =1=(k log m). To analyze our performance, we will take the standard approach of comparing the log-loss of our hypothesis on S to the log-loss of the target distribution on S =-=[14]-=-. We de ne the log-loss by loss(D; S) = P ~y2S , log D[~y] where D[~y] denotes the probability ~y is generated by the distribution D. Eventually we shall use the fact that for a su ciently large sampl... |

311 | Cryptographic limitations on learning Boolean formulae and finite automata
- Kearns, Valiant
- 1994
(Show Context)
Citation Context ...he learning algorithm output ahypothesis of certain syntactic form, and representation independent hardness results, in which a learning problem is shown hard regardless of the form of the hypothesis =-=[20]-=- and thus is inherently hard. While we seek only results of the second type, we must still specify whether it is learning with an evaluator or learning with a generator that is hard, or both. We prove... |

272 |
Pattern Classi cation and Scene Analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...ensity. Nearest-neighbor approaches to the unsupervised learning problem often arise in the nonparametric setting. While we obviously cannot do justice to these areas here, the books of Duda and Hart =-=[9]-=- and Vapnik [25] provide excellent overviews and introductions to the pattern recognition work, as well as many pointers for further reading. See also Izenman's recent survey article [16]. Roughly spe... |

242 |
M.: A simple unpredictable pseudo random number generator
- Blum, Blum, et al.
- 1986
(Show Context)
Citation Context ...1g n . It is easy to verify that HC n has both polynomial-size generators and evaluators. Theorem 18 The class HC n is e ciently learnable with a generator, and under the Quadratic Residue Assumption =-=[4]-=- is not e ciently learnable with an evaluator. Proof: (Sketch) The hardness of learning with an evaluator is straightforward and omitted. The algorithm for learning with a generator simply takes a lar... |

171 | Learning in the Presence of Malicious Errors
- Kearns, Li
- 1988
(Show Context)
Citation Context ...there is a complete covering of the second set of observations S in our candidate centers set, but there will be a partial covering. We can then use the greedy heuristic for the partial cover problem =-=[19]-=- and conduct a similar analysis. (Theorem 10) In contrast to the covering approach taken in the algorithm of Theorem 10, the algorithm of the following theorem uses an equation-solving technique. Theo... |

158 |
A greedy heuristic for the set covering problem
- Chvatal
- 1979
(Show Context)
Citation Context ... structed an instance of set cover in which the optimal cover has cardinality at most k. By applying the greedy algorithm, we obtain a subcollection of at most k log m candidate centers that covers S =-=[7]-=-. Let us assume without loss of generality that this subcollection is simply f~x 0 1;:::;~x 0 k log mg = C 0 . Our hypothesis distribution is this subcollection, with corruption probability p and unif... |

112 | On the theory of average case complexity - Ben-David, Chor, et al. - 1992 |

103 |
Recent developments in nonparametric density estimation
- Izenman
- 1991
(Show Context)
Citation Context ...Duda and Hart [9] and Vapnik [25] provide excellent overviews and introductions to the pattern recognition work, as well as many pointers for further reading. See also Izenman's recent survey article =-=[16]-=-. Roughly speaking, our work departs from the traditional statistical and pattern recognition approaches in two ways. First, we place explicit emphasis on the computational complexity of distribution ... |

91 | On the computational complexity of approximating distributions by probabilistic automata
- Abe, Warmuth
- 1992
(Show Context)
Citation Context ...her di erent types of representations for a probability distribution D. The rst representation, called an evaluator for D, takes as input any vector ~y 2f0; 1g n , and outputs the real number D[~y] 2 =-=[0; 1]-=-, that is, the weight that ~y is given under D. The second and usually less demanding representation, called a generator for D, takes as input a string of truly random bits, and outputs a vector ~y 2f... |

89 | Cryptographic primitives based on hard learning problems
- Blum, Furst, et al.
- 1993
(Show Context)
Citation Context ... respect to the uniform distribution can be embedded in the PFAn learning problem. Thus we prove our theorem under the following conjecture, for which some evidence has been provided in recent papers =-=[18, 3]-=-. Conjecture 15 (Noisy Parity Assumption) There isaconstant 0 < < 1 such that there is no e cient algorithm for 2 learning parity functions under the uniform distribution in the PAC model with classi ... |

73 | Average Case Completeness
- Gurevich
- 1991
(Show Context)
Citation Context ...utions that can be only generated e ciently, and distributions that can be both generated and evaluated e ciently. Similar distinctions have been made before in the context of average-case complexity =-=[2, 13]-=-. We now make these notions precise. We start by de ning an e cient generator. De nition 1 Let Dn be a class of distributions over f0; 1g n . We say that Dn has polynomial-size generators if there are... |

62 |
Learning integer lattices
- Helmbold, Sloan, et al.
- 1992
(Show Context)
Citation Context ...orem 8 The class PARITY n is e ciently exactly learnable with a generator and evaluator. Proof: The learning algorithm uses as a subroutine an algorithm for learning parity functions in the PAC model =-=[10, 15]-=- by solving a system of linear equations over the eld of integers modulo 2. In the current context, this subroutine receives random examples of the form h~x;fS(~x)i, where ~x 2f0; 1g n is chosen unifo... |

25 |
On Learning Ring-Sum Expansions
- Fischer, Simon
(Show Context)
Citation Context ...orem 8 The class PARITY n is e ciently exactly learnable with a generator and evaluator. Proof: The learning algorithm uses as a subroutine an algorithm for learning parity functions in the PAC model =-=[10, 15]-=- by solving a system of linear equations over the eld of integers modulo 2. In the current context, this subroutine receives random examples of the form h~x;fS(~x)i, where ~x 2f0; 1g n is chosen unifo... |

20 | Inclusion-exclusion: Exact and approximate
- Kahn, Linial, et al.
- 1996
(Show Context)
Citation Context ...mong (at most) k + 1 sets of inputs. Naively, wewould explicitly compute the sizes of all 2 k+1 intersections. We can obtain an improved bound by using a new result of Kahn, Linial, and Samorodintsky =-=[17]-=- which shows that the sizes of all 2 k+1 intersections are in fact uniquely determined by the sizes of all intersections of at most 2 log k + 2 sets. More formally: Lemma 6 (Implicit in Kahn et al. [1... |

18 |
E cient distribution-free learning of probabilistic concepts
- Kearns, Schapire
- 1990
(Show Context)
Citation Context ...rithm provided it is given a su ciently large sample. This powerful principle goes by the name Occam's Razor, and it can be veri ed for many learning models, including our distribution learning model =-=[5, 6, 21, 14]-=-. In the distribution-free PAC model, the converse to Occam's Razor can be shown to hold as well [11, 22]. Specifically, ifany class of polynomial-size circuits over f0; 1g n is e ciently learnable in... |

4 |
cient noise-tolerant learning from statistical queries
- unknown authors
- 1993
(Show Context)
Citation Context ... respect to the uniform distribution can be embedded in the PFAn learning problem. Thus we prove our theorem under the following conjecture, for which some evidence has been provided in recent papers =-=[18, 3]-=-. Conjecture 15 (Noisy Parity Assumption) There isaconstant 0 < < 1 such that there is no e cient algorithm for 2 learning parity functions under the uniform distribution in the PAC model with classi ... |

1 |
An improved boosting algorithm and its implications on learning complexity
- YoavFreund
- 1992
(Show Context)
Citation Context ...nd it can be veri ed for many learning models, including our distribution learning model [5, 6, 21, 14]. In the distribution-free PAC model, the converse to Occam's Razor can be shown to hold as well =-=[11, 22]-=-. Specifically, ifany class of polynomial-size circuits over f0; 1g n is e ciently learnable in the distribution-free PAC model, then it is e ciently learnable by an algorithm whose hypothesis is a bo... |

1 |
The complexityofenumeration and reliability problems
- Valiant
- 1979
(Show Context)
Citation Context ...k n does not have polynomial-size evaluators, unless #P P=poly. Proof: We use the fact that exactly counting the number of satisfying assignments to a monotone 2-CNF formula is a #P -complete problem =-=[23]-=-. The circuit Cn will have inputs x1;:::;xn that will correspond to the variables of a monotone 2-CNF formula, and also inputs zi;j for each possible monotone clause (xi _ xj). The outputs will consis... |