## Efficient noise-tolerant learning from statistical queries (1998)

### Cached

### Download Links

- [www.cs.iastate.edu]
- [www.cis.upenn.edu]
- [classes.cec.wustl.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | JOURNAL OF THE ACM |

Citations: | 289 - 5 self |

### BibTeX

@INPROCEEDINGS{Kearns98efficientnoise-tolerant,

author = {Michael Kearns},

title = {Efficient noise-tolerant learning from statistical queries},

booktitle = {JOURNAL OF THE ACM},

year = {1998},

pages = {392--401},

publisher = {ACM Press}

}

### Years of Citing Articles

### OpenURL

### Abstract

In this paper, we study the problem of learning in the presence of classification noise in the probabilistic learning model of Valiant and its variants. In order to identify the class of “robust” learning algorithms in the most general way, we formalize a new but related model of learning from statistical queries. Intuitively, in this model, a learning algorithm is forbidden to examine individual examples of the unknown target function, but is given access to an oracle providing estimates of probabilities over the sample space of random examples. One of our main results shows that any class of functions learnable from statistical queries is in fact learnable with classification noise in Valiant’s model, with a noise rate approaching the information-theoretic barrier of 1/2. We then demonstrate the generality of the statistical query model, showing that practically every class learnable in Valiant’s model and its variants can also be learned in the new model (and thus can be learned in the presence of noise). A notable exception to this statement is the class of parity functions, which we prove is not learnable from statistical queries, and for which no noise-tolerant algorithm is known.

### Citations

1706 | A theory of the learnable
- Valiant
- 1984
(Show Context)
Citation Context ...ficient Noise-Tolerant Learning From Statistical Queries Michael Kearns AT&T Bell Laboratories Murray Hill, New Jersey 1 Introduction In this paper, we study the extension of Valiant's learning model =-=[25]-=- in which the positive or negative classification label provided with each random example may be corrupted by random noise. This extension was first examined in the learning theory literature by Anglu... |

947 |
On the uniform convergence of the relative frequencies of events to their probablities. Theory Prob
- Vapnik, Chervonenkis
- 1971
(Show Context)
Citation Context ... 2 d possible binary labelings of the points in S, there is a function in F that agrees with that labeling. The Vapnik-Chervonenkis dimensionsof F is the cardinality of the largest set shattered by F =-=[27]-=-. 5 Simulating Statistical Queries Using Noisy Examples Our first theorem formalizes the intuition given above that learning from statistical queries implies learning in the noisefree Valiant model. T... |

627 |
Learnability and the VapnikChervonenkis dimension
- Blumer, Ehrenfeucht, et al.
- 1989
(Show Context)
Citation Context ...re centers on the tradeoff between the number of statistical queries that must be made, and the required accuracy of these queries. For instance, translation of Valiant model sample size lower bounds =-=[3, 4]-=- into the statistical query model leaves open the possibility that some classes might be learned with just a single statistical query of sufficiently small allowed approximation error. Here we dismiss... |

596 | An Introduction to Computational Learning Theory - Kearns, Vazirani - 1994 |

517 |
Perceptrons: An introduction to computational geometry
- Minsky, Papert
- 1969
(Show Context)
Citation Context ... is the uniform distribution on the unit sphere (or any other radially symmetric distribution). Despite the voluminous literature on learning perceptrons in general (see the work of Minsky and Papert =-=[17]-=- for a partial bibliography) and with respect to this distribution in particular [23, 2, 6], no efficient noise-tolerant learning algorithm has been given previously. Here we give a very simple and ef... |

375 | Learning decision lists
- Rivest
- 1987
(Show Context)
Citation Context ...ill see a somewhat detailed example of this approach momentarily. A partial list of the efficient algorithms employing some version of this approach is: Rivest's algorithm for learning decision lists =-=[19]-=-; Haussler's algorithm for learning boolean conjunctions with few relevant variables [8]; the algorithm of Blumer et al. for learning a union of axis-aligned rectangles in the Euclidean plane; and the... |

281 |
Constant depth circuits, Fourier transform, and learnability
- Linial, Mansour, et al.
- 1989
(Show Context)
Citation Context ...ing n-dimensional axis-aligned rectangles with noise; learning AC 0 with noise with respect to the uniform distribution in time O(n poly(log n )) (for which the algorithm of Linial, Mansour and Nisan =-=[16]-=- can be shown to fall into the statistical query model without modification); and many others. The fact that practically every concept class known to be efficiently learnable in the Valiant model can ... |

224 |
Quantifying Inductive Bias: AI Learning Algorithms and VALIANT’s Learning Framework
- Haussler
- 1988
(Show Context)
Citation Context ...fficient algorithms employing some version of this approach is: Rivest's algorithm for learning decision lists [19]; Haussler's algorithm for learning boolean conjunctions with few relevant variables =-=[8]-=-; the algorithm of Blumer et al. for learning a union of axis-aligned rectangles in the Euclidean plane; and the algorithm of Kearns and Pitt [13] for learning pattern languages with respect to produc... |

222 |
Learning from noisy examples
- Angluin, Laird
- 1987
(Show Context)
Citation Context ... positive or negative classification label provided with each random example may be corrupted by random noise. This extension was first examined in the learning theory literature by Angluin and Laird =-=[1]-=-, who formalized the simplest type of white label noise and then sought algorithms tolerating the highest possible rate of noise. In addition to being the subject of a number of theoretical studies [1... |

198 | Efficient distribution-free learning of probabilistic concepts
- Kearns, Schapire
- 1994
(Show Context)
Citation Context ...ning boolean conjunctions that tolerates a noise rate approaching the information-theoretic barrier of 1=2. Subsequently, there have been some isolated instances of efficient noisetolerant algorithms =-=[14, 20, 22]-=-, but little work on characterizing which classes can be efficiently learned in the presence of noise, and no general transformations of Valiant model algorithms into noise-tolerant algorithms. The pr... |

193 |
A General Lower Bound on the Number of Examples Needed for Learning
- Elrenfeucht, Haussler, et al.
- 1988
(Show Context)
Citation Context ...re centers on the tradeoff between the number of statistical queries that must be made, and the required accuracy of these queries. For instance, translation of Valiant model sample size lower bounds =-=[3, 4]-=- into the statistical query model leaves open the possibility that some classes might be learned with just a single statistical query of sufficiently small allowed approximation error. Here we dismiss... |

191 | Computational limitations on learning from examples
- Pitt, Valiant
- 1988
(Show Context)
Citation Context ...s be quite significant, as previous results have demonstrated concept classes F for which the choice of hypothesis representation can mean the difference between intractability and efficient learning =-=[18, 12]-=-. by p(1=ffl; 1=ffi; n; size(f)) and output a representation in H of a function h that with probability at least 1 \Gamma ffi satisfies error(h)sffl. This probability is taken over the random draws fr... |

168 | Learning Boolean formulas
- Kearns, Li, et al.
- 1994
(Show Context)
Citation Context ...s be quite significant, as previous results have demonstrated concept classes F for which the choice of hypothesis representation can mean the difference between intractability and efficient learning =-=[18, 12]-=-. by p(1=ffl; 1=ffi; n; size(f)) and output a representation in H of a function h that with probability at least 1 \Gamma ffi satisfies error(h)sffl. This probability is taken over the random draws fr... |

166 | Learning in the presence of malicious errors
- Kearns, Li
- 1987
(Show Context)
Citation Context ...1], who formalized the simplest type of white label noise and then sought algorithms tolerating the highest possible rate of noise. In addition to being the subject of a number of theoretical studies =-=[1, 15, 24, 11]-=-, the classification noise model has become a common paradigm for experimental machine learning research. Angluin and Laird provided an algorithm for learning boolean conjunctions that tolerates a noi... |

118 | Weakly learning DNF and characterizing statistical query learning using Fourier analysis - Blum, Furst, et al. - 1994 |

117 |
Learning disjunction of conjunctions
- Valiant
- 1985
(Show Context)
Citation Context ...iven to the learner; the inputs x given to the learner remain independently distributed according to D. Other models allowing corruption of the input as well as the label have been studied previously =-=[26, 11]-=-, with considerably less success in finding efficient error-tolerant algorithms. Here we will concentrate primarily on the classification noise model, although in Section 9 we will examine a more real... |

62 |
Learning integer lattices
- Helmbold, Sloan, et al.
- 1992
(Show Context)
Citation Context ...the parity of some unknown subset of the boolean variablessx1 ; : : : ; xn ), which is known to be efficiently learnable in the Valiant model via the solution of a system of linear equations modulo 2 =-=[9]-=-, is not efficiently learnable from statistical queries. The fact that the separation of the two models comes via this class is of particular interest, since the parity class has no known efficient no... |

61 | A polynomialtime algorithm for learning noisy linear threshold functions - Blum, Frieze, et al. - 1998 |

60 | The Computational Complexity of Machine Learning
- Kearns
- 1990
(Show Context)
Citation Context ...aliant model also efficiently learnable with noise? Note that any counterexamples to such equivalences should not depend on syntactic hypothesis restrictions, but should be representation independent =-=[10]-=-. Acknowledgements Thanks to Umesh Vazirani for the early conversations from which this research grew, to Rob Schapire for many insightful comments and his help with the proof of Theorem 5, and to Jay... |

60 |
The design and analysis of efficient learning algorithms
- Schapire
- 1992
(Show Context)
Citation Context ...ning boolean conjunctions that tolerates a noise rate approaching the information-theoretic barrier of 1=2. Subsequently, there have been some isolated instances of efficient noisetolerant algorithms =-=[14, 20, 22]-=-, but little work on characterizing which classes can be efficiently learned in the presence of noise, and no general transformations of Valiant model algorithms into noise-tolerant algorithms. The pr... |

51 | Learning from Good and Bad Data - Laird - 1988 |

47 |
Types of noise in data for concept learning
- Sloan
- 1988
(Show Context)
Citation Context ...1], who formalized the simplest type of white label noise and then sought algorithms tolerating the highest possible rate of noise. In addition to being the subject of a number of theoretical studies =-=[1, 15, 24, 11]-=-, the classification noise model has become a common paradigm for experimental machine learning research. Angluin and Laird provided an algorithm for learning boolean conjunctions that tolerates a noi... |

45 |
Statistical mechanics of learning from examples
- Seung, Sompolinsky, et al.
- 1992
(Show Context)
Citation Context ...ribution). Despite the voluminous literature on learning perceptrons in general (see the work of Minsky and Papert [17] for a partial bibliography) and with respect to this distribution in particular =-=[23, 2, 6]-=-, no efficient noise-tolerant learning algorithm has been given previously. Here we give a very simple and efficient algorithm for learning from statistical queries (and thus an algorithm tolerating n... |

45 | General bounds on statistical query learning and PAC learning with noise via hypothesis boosting - Aslam, Decatur - 1998 |

35 | Learning noisy perceptrons by a perceptron in polynomial time - Cohen - 1997 |

33 | Specification and simulation of statistical query algorithms for efficiency and noise tolerance - Aslam, Decatur - 1998 |

27 | Improved learning of AC 0 functions - Furst, Jackson, et al. - 1991 |

26 | On learning ring-sum-expansions - Fischer, Simon - 1992 |

21 | A polynomial-time algorithm for learning k-variable pattern languages from examples
- Kearns, Pitt
- 1989
(Show Context)
Citation Context ...rning boolean conjunctions with few relevant variables [8]; the algorithm of Blumer et al. for learning a union of axis-aligned rectangles in the Euclidean plane; and the algorithm of Kearns and Pitt =-=[13]-=- for learning pattern languages with respect to product distributions. In its original form, the covering method is not noisetolerant, and indeed with the exception of decision lists [14, 20], until n... |

20 | Learning from Good and Bad Data. Kluwer international series in engineering and computer science - Laird - 1988 |

17 |
Improved learning of AC ~ functions
- Furst, Jackson, et al.
- 1991
(Show Context)
Citation Context ...g AC 0 in time O(n poly(log n )) with respect to the uniform distribution in the Valiant model (and its subsequent generalization with respect to product distributions due to Furst, Jackson and Smith =-=[5]-=-); several efficient algorithms for learning restricted forms of DNF with respect to the uniform distribution in the Valiant model [12]; and efficient algorithms for learning unbounded-depth readonce ... |

13 |
Learning monotone k-� DNF formulas on product distributions
- HANCOCK, MANSOUR
- 1991
(Show Context)
Citation Context ... with respect to the uniform distribution in the Valiant model [12]; and efficient algorithms for learning unbounded-depth readonce circuits with respect to product distributions in the Valiant model =-=[21, 7]-=-. For all of these classes we can obtain efficient algorithms for learning with noise by Theorem 3; in this list, only for conjunctions [1] and Schapire's work on read-once circuits [21] were there pr... |

13 |
Learning probabilistic read-once formulas on product distributions
- Schapire
- 1994
(Show Context)
Citation Context ... with respect to the uniform distribution in the Valiant model [12]; and efficient algorithms for learning unbounded-depth readonce circuits with respect to product distributions in the Valiant model =-=[21, 7]-=-. For all of these classes we can obtain efficient algorithms for learning with noise by Theorem 3; in this list, only for conjunctions [1] and Schapire's work on read-once circuits [21] were there pr... |

12 |
Three unfinished works on the optimal storage capacity of networks
- Gardner, Derrida
- 1989
(Show Context)
Citation Context ...ribution). Despite the voluminous literature on learning perceptrons in general (see the work of Minsky and Papert [17] for a partial bibliography) and with respect to this distribution in particular =-=[23, 2, 6]-=-, no efficient noise-tolerant learning algorithm has been given previously. Here we give a very simple and efficient algorithm for learning from statistical queries (and thus an algorithm tolerating n... |

9 |
Algorithmic Learning of Formal Languages and Decision Trees
- Sakakibara
- 1991
(Show Context)
Citation Context ...ning boolean conjunctions that tolerates a noise rate approaching the information-theoretic barrier of 1=2. Subsequently, there have been some isolated instances of efficient noisetolerant algorithms =-=[14, 20, 22]-=-, but little work on characterizing which classes can be efficiently learned in the presence of noise, and no general transformations of Valiant model algorithms into noise-tolerant algorithms. The pr... |

8 | The transition to perfect generalization in perceptrons - Baum, Lyuu - 1991 |

2 |
and Yuh-Dauh Lyuu. The transition to perfect generalization in perceptrons
- Baum
- 1991
(Show Context)
Citation Context ...ribution). Despite the voluminous literature on learning perceptrons in general (see the work of Minsky and Papert [17] for a partial bibliography) and with respect to this distribution in particular =-=[23, 2, 6]-=-, no efficient noise-tolerant learning algorithm has been given previously. Here we give a very simple and efficient algorithm for learning from statistical queries (and thus an algorithm tolerating n... |