## The Strength of Weak Learnability (1990)

Venue: | Machine Learning |

Citations: | 666 - 22 self |

### BibTeX

@INPROCEEDINGS{Schapire90thestrength,

author = {Robert E. Schapire},

title = {The Strength of Weak Learnability},

booktitle = {Machine Learning},

year = {1990}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract. This paper addresses the problem of improving the accuracy of an hypothesis output by a learning algorithm in the distribution-free (PAC) learning model. A concept class is learnable (or strongly learnable) if, given access to a Source of examples of the unknown concept, the learner with high probability is able to output an hypothesis that is correct on all but an arbitrarily small fraction of the instances. The concept class is weakly learnable if the learner can produce an hypothesis that performs only slightly better than random guessing. In this paper, it is shown that these two notions of learnability are equivalent. A method is described for converting a weak learning algorithm into one that achieves arbitrarily high accuracy. This construction may have practical applications as a tool for efficiently converting a mediocre learning algorithm into one that performs extremely well. In addition, the construction has some interesting theoretical consequences, including a set of general upper bounds on the complexity of any strong learning algorithm as a function of the allowed error e.

### Citations

1496 | Probability inequalities for sums of bounded random variables
- Hoeffding
- 1963
(Show Context)
Citation Context ... e/4, completing the proof. • To bound the number of examples needed to estimate a1 and e, we will make use of the following bounds on the tails of a binomial distribution (Angluin and Valiant, 1979; =-=Hoeffding, 1963-=-). LEMMA 4. (Chernoff Bounds) Consider a sequence of m independent Bernoulli trials, each succeeding with probability p. Let 5 be the random variable describing the total number of successes. Then for... |

650 | Queries and concept learning - Angluin - 1988 |

626 | Learnability and the Vapnik-Chervonenkis dimension - Blumer, Ehrenfeucht, et al. - 1987 |

375 | Learning decision lists
- Rivest
- 1987
(Show Context)
Citation Context ...one described for k-term DNF in Section 5.3, and it is possible to find similar algorithms for a number of other concept classes that are already known to be learnable (for example, k-decision lists (=-=Rivest, 1987-=-) and rank r decision trees (Ehrenfeucht and Haussler, 1989)). To what extent will this approach be fruitful for other classes not presently known to be learnable? This is an open question. Another op... |

306 | Cryptographic limitations on learning boolean formulae and finite automata - Kearns, Valiant - 1989 |

256 |
Fast probabilistic algorithms for Hamiltonian circuits and matchings
- Angluin, Valiant
- 1979
(Show Context)
Citation Context ... > e/4. We conclude w + y > e/4, completing the proof. • To bound the number of examples needed to estimate a1 and e, we will make use of the following bounds on the tails of a binomial distribution (=-=Angluin and Valiant, 1979-=-; Hoeffding, 1963). LEMMA 4. (Chernoff Bounds) Consider a sequence of m independent Bernoulli trials, each succeeding with probability p. Let 5 be the random variable describing the total number of su... |

209 | Finding patterns common to a set of strings - Angluin - 1980 |

191 | Computational limitations on learning from examples
- Pitt, Valiant
- 1988
(Show Context)
Citation Context ...ples. We briefly argue that this hypothesis is, with high probability, (1/2 - f(l/nk))-close to the target concept. First, note that the target k-term DNF formula is equivalent to some k-CNF formula (=-=Pitt and Valiant, 1988-=-). (A formula in conjunctive normal form (CNF) is one written as the conjunction of clauses, each clause a disjunction of literals. If each clause consists of only k literals, then the formula is in k... |

168 | Learning Boolean formulas - Kearns, Li, et al. - 1994 |

90 | Equivalence of models for polynomial learnability - HAUSSLER, KEARNS, et al. - 1988 |

66 | Learning decision trees from random examples - Ehrenfeucht, Haussler - 1989 |

53 | Predicting {0, 1}-functions on randomly drawn points - Haussler, Littlestone, et al. - 1994 |

42 | On the necessity of Occam Algorithms - Board, Pitt - 1990 |

40 | M.: Learning nested differences of intersection-closed concept classes - Helmbold, Sloan, et al. - 1990 |

27 | Learning Boolean formulae or finite automata is as hard as factoring - Kearns, Valiant - 1988 |

18 | Space-bounded learning and the Vapnik-Chervonenkis dimension - Floyd - 1989 |

5 | Expected mistake bounds for on-line learning algorithms - Haussler, Littlestone, et al. - 1988 |

4 | Some remarks about space-complexity of learning, and circuit complexity of recognizing - Boucheron, Sallantin - 1988 |

3 | The Computational Complexity of Machine Learning. Doctoral dissertation - Kearns - 1989 |

2 | Predicting {0, l}-functions on randomly drawn points - Haussler, Littlestone, et al. - 1990 |

1 | On learning a union of half spaces. Unpublished manuscript - Baum - 1989 |

1 | Space efficient learning algorithms (Technical Report UCSC-CRL-88-2 - Haussler - 1988 |

1 |
Pattern languages are not learnable. Unpublished manuscript
- Schapire
- 1989
(Show Context)
Citation Context ... set P/poly (NP/poly) consists of those languages accepted by a family of polynomial-size deterministic (nondeterministic) circuits.) Furthermore, since learning pattern languages was recently shown (=-=Schapire, 1989-=-) to be as hard as learning NP/poly, this result shows that pattern languages are also unlearnable under this relatively weak structural assumption.s222 R.E. SCHAPIRE THEOREM 7. Suppose C is learnable... |

1 | Occam's razor, Infotwtation Ptwcessing Letters - Blumer, Ehmnfeucht, et al. - 1987 |

1 | Some remarks about space-complexity of learning, and circuit complexity of recognizing - Bouchemn, Sallantln - 1988 |

1 | Learning decision trees from random exmnples, hfimnatiou and putation - Ehrenfeucht, Haussler - 1989 |

1 | Probability inequalities for sums of bounded randent variables - Hoeffdiug - 1963 |

1 | Space efficient learning algorithms (Technical Report UCSC-CRL-88-2 - Hanssler - 1988 |