## Learning Simple Concepts Under Simple Distributions (1991)

Venue: | SIAM JOURNAL OF COMPUTING |

Citations: | 55 - 3 self |

### BibTeX

@ARTICLE{Li91learningsimple,

author = {Ming Li and Paul M. B. Vitányi},

title = {Learning Simple Concepts Under Simple Distributions},

journal = {SIAM JOURNAL OF COMPUTING},

year = {1991},

volume = {20},

pages = {911--935}

}

### Years of Citing Articles

### OpenURL

### Abstract

We aim at developing a learning theory where `simple' concepts are easily learnable. In Valiant's learning model, many concepts turn out to be too hard (like NP hard) to learn. Relatively few concept classes were shown to be learnable polynomially. In daily life, it seems that things we care to learn are usually learnable. To model the intuitive notion of learning more closely, we do not require that the learning algorithm learns (polynomially) under all distributions, but only under all simple distributions. A distribution is simple if it is dominated by an enumerable distrib...

### Citations

1771 | An Introduction to Kolmogorov Complexity and its Applications
- Li, Vitanyi
- 1993
(Show Context)
Citation Context ... is customary in mathematics to view such a circumstance as evidence that we are dealing with a fundamental notion. See [ZL] for the analogous concepts in continuous sample spaces; also see [G2], and =-=[LV1]-=- or [LV2] for elaboration of the cited facts and proofs. This universal distribution has many important properties. Under m, easily describable objects have high probability, and complex or random obj... |

1747 | A theory of the learnable
- Valiant
- 1984
(Show Context)
Citation Context ...d a learning theory, where one wants to learn a concept with high probability, in polynomial time, and a polynomial number of examples, within a certain error, under all distributions on the examples =-=[V]-=-. A precise definition of this `pac-learning' is given in Section 1.2. Let us highlight its special features. It contrasts with the common approach in statistical inference, or recursion theoretical l... |

705 |
Approximation algorithms for combinatorial problems
- Johnson
- 1973
(Show Context)
Citation Context ... monomials may also be in S. Finding all of the original monomials of f precisely is NP-hard. For the purpose of learning it is sufficient to approximate f. We use the following result due to Johnson =-=[J] and Lov-=-asz [Lo], Claim 3. Given sets A 1 , . . . , A n , such that �� i =1 n A i =A={1, . . . , q}. If there exist k sets A i 1 , . . . , A i k such that A =�� j =1 k A i , then it is possible to fin... |

684 | Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm - Littlestone - 1988 |

524 | Learning regular sets from queries and counterexamples - Angluin - 1987 |

348 |
Measure theory
- Halmos
- 1950
(Show Context)
Citation Context ... each resulting set having an appropriate ��-measure. The resulting sets are called Borel sets, and form a so-called s-algebra denoted by, say, s. The pair (s, ��), is called a probability fie=-=ld. See [Ha]. Exampl-=-e. The discrete probability distributions we considered before, actually probability densities, correspond to measures with B =N �� {u} and the sample space restricted to {x : x ��W, l (x) = 1... |

273 |
On the ratio of optimal integral and fractional covers
- Lovász
- 1975
(Show Context)
Citation Context ...also be in S. Finding all of the original monomials of f precisely is NP-hard. For the purpose of learning it is sufficient to approximate f. We use the following result due to Johnson [J] and Lovasz =-=[Lo], Claim -=-3. Given sets A 1 , . . . , A n , such that �� i =1 n A i =A={1, . . . , q}. If there exist k sets A i 1 , . . . , A i k such that A =�� j =1 k A i , then it is possible to find in polynomial ... |

196 | The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms
- Zvonkin, Levin
- 1970
(Show Context)
Citation Context ...t formalizations turn out to define the same notion of universal probability. It is customary in mathematics to view such a circumstance as evidence that we are dealing with a fundamental notion. See =-=[ZL]-=- for the analogous concepts in continuous sample spaces; also see [G2], and [LV1] or [LV2] for elaboration of the cited facts and proofs. This universal distribution has many important properties. Und... |

192 | Computational Limitations on Learning from Examples
- Valiant
- 1986
(Show Context)
Citation Context ...y learnable under m (and hence under all simple distributions in our model). A Boolean formula is monotone if no variable in it is negated. A k-term DNF is a DNF consisting of at most k monomials. In =-=[PV]-=- it was shown that learning a monotone k-term DNF by k-term (or 2k-term) DNF is NP-complete (See also [KLPV]). Theorem 5. The class of monotone k-term DNF is polynomially learnable by monotone kterm D... |

170 | On the learnability of boolean formulae
- Kearns, Li, et al.
- 1987
(Show Context)
Citation Context ... if no variable in it is negated. A k-term DNF is a DNF consisting of at most k monomials. In [PV] it was shown that learning a monotone k-term DNF by k-term (or 2k-term) DNF is NP-complete (See also =-=[KLPV]-=-). Theorem 5. The class of monotone k-term DNF is polynomially learnable by monotone kterm DNF, under m. Proof. Assume we are learning a monotone k-term DNF f (x 1 , . . . , x n ) =m 1 + . . . +m k , ... |

169 | Inference of reversible languages - Angluin - 1982 |

168 | Learning in the presence of malicious errors
- Kearns, Li
- 1993
(Show Context)
Citation Context ...ve examples. These models are basically equivalent. Also see [HLW] for an online model. We need the following very useful theorem proved by Blumer, Ehrenfeucht, Haussler, and Warmuth [BEHW]. See also =-=[KL] for-=- the case when the concept is only consistent with a fraction of the examples. Occam's Razor Theorem. Let C and Csbe concept classes. Let c ��C be the target concept, and let n be the length of it... |

110 | On the theory of average case complexity
- Ben-David, Chor, et al.
- 1992
(Show Context)
Citation Context ... not contain a universal distribution - there is a universal distribution for this class but not in it. It has been shown, however, that the class of polynomial samplable distributions, as defined in =-=[BCGL]-=-, contains a universal distribution. One may also further restrict our assumption to even narrower classes to make the theory practically usable. An entirely similar set of considerations holds for th... |

99 | of information conservation (non-growth) and aspects of the foundation of probability theory, Problems of Information Transmission 10 - Laws - 1974 |

79 | On the synthesis of finite-state machines from samples of their behaviour - Biermann, Feldman - 1972 |

73 | Diversity-based inference of finite automata
- Rivest, Schapire
- 1994
(Show Context)
Citation Context ...tion. Because it does not know the real distribution, the robot just has to generate its own examples according to its own (computable) distribution and do experiments to classify these examples (See =-=[RS]-=-). For example, in case of learning a finite state black box (with resetting mechanism and observable accepting/ rejecting behavior). Putting a man on the moon, we cannot learn according to the real d... |

68 |
Classifying learnable geometric concepts with the Vapnik-Chervonenkis dimension
- Blumer, Ehrenfeucht, et al.
- 1986
(Show Context)
Citation Context ...sitive and negative examples. These models are basically equivalent. Also see [HLW] for an online model. We need the following very useful theorem proved by Blumer, Ehrenfeucht, Haussler, and Warmuth =-=[BEHW]. Se-=-e also [KL] for the case when the concept is only consistent with a fraction of the examples. Occam's Razor Theorem. Let C and Csbe concept classes. Let c ��C be the target concept, and let n be t... |

60 |
Inductive reasoning and Kolmogorov complexity
- Li, Vitanyi
- 1992
(Show Context)
Citation Context ...mary in mathematics to view such a circumstance as evidence that we are dealing with a fundamental notion. See [ZL] for the analogous concepts in continuous sample spaces; also see [G2], and [LV1] or =-=[LV2]-=- for elaboration of the cited facts and proofs. This universal distribution has many important properties. Under m, easily describable objects have high probability, and complex or random objects have... |

40 |
Learnability by fixed distributions
- Benedek, Itai
- 1988
(Show Context)
Citation Context ...cepts in W is learnable under M iff it is learnable under each simple measure. Proof. The `if' part holds vacuously. We only need to prove the `only if' part. We use some definitions and results from =-=[BI]. According to [-=-BI], C e is an e-cover of C, with respect to distribution ��, if for every c ��C there is a cs��C e which is e-close to c (��(cDcs)se). A concept class C is finitely coverable, if for ... |

34 | On learning boolean functions - Natarajan - 1987 |

5 |
Expected mistake bounds for on-line learning algorithms
- Haussler, Littlestone, et al.
- 1988
(Show Context)
Citation Context ...ble by C', or simply, C is learnable. Remark. A different model as used by [V,KLPV] assumes separate distributions over positive and negative examples. These models are basically equivalent. Also see =-=[HLW]-=- for an online model. We need the following very useful theorem proved by Blumer, Ehrenfeucht, Haussler, and Warmuth [BEHW]. See also [KL] for the case when the concept is only consistent with a fract... |

3 |
Learning decision-lists
- RIVEST
- 1987
(Show Context)
Citation Context ...an, say, clogn. We know this is not possible [LV1].) An interesting open question remains: is log n-decision list polynomially learnable under m(x)? A log n-decision list is a decision list of Rivest =-=[R]-=- with each term having Kolmogorov complexity O (log n). 2.3. Simple DNF We learn a more general class of DNF formulae in this section. This time, we allow each term to have very high Kolmogorov comple... |

1 |
acs, On the symmetry of algorithmic information
- G
(Show Context)
Citation Context ...y m, such that " i ��N + $ c > 0 " x ��N [c m(x) �� P i (x)]. (1) That is, m dominates each P i multiplicatively. Let K (x) be the prefix variant of Kolmogorov complexity first p=-=roposed by L.A. Levin [L,G1]-=-. This is defined as follows. Consider an enumeration T 1 , T 2 , ... of Turing machines with a separate binary one-way input tape. Let T be such a machine. If T halts with output x, then T has scanne... |

1 |
acs, Lecture notes on descriptional complexity and randomness
- G
- 1987
(Show Context)
Citation Context ...bility. It is customary in mathematics to view such a circumstance as evidence that we are dealing with a fundamental notion. See [ZL] for the analogous concepts in continuous sample spaces; also see =-=[G2]-=-, and [LV1] or [LV2] for elaboration of the cited facts and proofs. This universal distribution has many important properties. Under m, easily describable objects have high probability, and complex or... |