## Toward efficient agnostic learning (1992)

### Cached

### Download Links

- [www.cs.princeton.edu:80]
- [www.cs.princeton.edu]
- [www.cs.princeton.edu]
- [www.research.att.com]
- [www.cis.upenn.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory |

Citations: | 194 - 7 self |

### BibTeX

@INPROCEEDINGS{Kearns92towardefficient,

author = {Michael J. Kearns and Robert E. Schapire and Linda M. Sellie and Lisa Hellerstein},

title = {Toward efficient agnostic learning},

booktitle = {In Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory},

year = {1992},

pages = {341--352}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract. In this paper we initiate an investigation of generalizations of the Probably Approximately Correct (PAC) learning model that attempt to significantly weaken the target function assumptions. The ultimate goal in this direction is informally termed agnostic learning, in which we make virtually no assumptions on the target function. The name derives from the fact that as designers of learning algorithms, we give up the belief that Nature (as represented by the target function) has a simple or succinct explanation. We give a number of positive and negative results that provide an initial outline of the possibilities for agnostic learning. Our results include hardness results for the most obvious generalization of the PAC model to an agnostic setting, an efficient and general agnostic learning method based on dynamic programming, relationships between loss functions for agnostic learning, and an algorithm for a learning problem that involves hidden variables.

### Citations

10973 | Computers and Intractability: A Guide to the Theory of NP-Completeness - Garey, Johnson - 1979 |

3937 |
Pattern classification and scene analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ... real-valued basis functions on X, then standard regression techniques can be used to efficiently minimize the empirical quadratic loss over the set of all linear combinations of the basis functions (=-=Duda & Hart, 1973-=-; Kearns & Schapire, 1990). When is empirical minimization sufficient for agnostic learning? This question has been answered in large part by Dudley (1978), Haussler (1992),, Pollard (1984), Vapnik (1... |

1703 | A Theory of the Learnable - Valiant - 1984 |

1495 | Probability inequalities for sums of bounded random variables - Hoeffding - 1963 |

806 | Estimation of Dependencies Based on Empirical Data - Vapnik - 1979 |

666 | The strenght of weak learnability
- Schapire
- 1989
(Show Context)
Citation Context ...is class may be more powerful than the target class. Next, if A is the p-concept decomposition using a class F of p-concepts, T = F , and H ' F , then we obtain the p-concept learning model (Kearns & =-=Schapire, 1990-=-), and there are at least two interesting choices of loss functions. If we choose the prediction loss function Z then we ask for the optimal predictive model for the f0; 1g observations (also known as... |

625 | Learnability and the Vapnik-Chervonenkis dimension - Blumer, Ehrenfeucht, et al. - 1989 |

574 |
Convergence of Stochastic Processes
- POLLARD
- 1984
(Show Context)
Citation Context ...cally, we will be interested in the pseudo dimension of a class of functions F , a combinatorial property of F that largely characterizes the uniform convergence over F (Dudley, 1978; Haussler, 1992; =-=Pollard, 1984-=-). Let F be a class of functions f : X ! R, and let S = f(x 1 ; y 1 ); : : : ; (x d ; y d )g be a finite subset of X \Theta R. We say that F shatters S if f0; 1g d = fhpos(f(x 1 ) \Gamma y 1 ); : : : ... |

424 | Boosting a weak learning algorithm by majority - Freund - 1995 |

374 |
Decision theoretic generalizations of the PAC model for neural net and other learning applications
- Haussler
- 1992
(Show Context)
Citation Context ...ergence. Specifically, we will be interested in the pseudo dimension of a class of functions F , a combinatorial property of F that largely characterizes the uniform convergence over F (Dudley, 1978; =-=Haussler, 1992-=-; Pollard, 1984). Let F be a class of functions f : X ! R, and let S = f(x 1 ; y 1 ); : : : ; (x d ; y d )g be a finite subset of X \Theta R. We say that F shatters S if f0; 1g d = fhpos(f(x 1 ) \Gamm... |

306 | Cryptographic limitations on learning boolean formulae and finite automata
- Kearns, Valiant
- 1989
(Show Context)
Citation Context ...h the standard PAC criterion is relaxed to demand hypotheses whose error with respect to the target is bounded only by 1=2 \Gamma 1=p(n) for some polynomial p(n) of the complexity parameter (Kearns & =-=Valiant, 1994-=-; Schapire, 1990).) If T and T are two classes of boolean functions over a domain X parameterized by n, we say that T weakly approximates T if there is a polynomial p(n) such that for any distribution... |

281 | Constant depth circuits, Fourier transform, and learnability - Linial, Mansour, et al. - 1989 |

270 | Pattern Classi cation and Scene Analysis - Duda, Hart, et al. - 1973 |

198 | Efficient distribution-free learning of probabilistic concepts
- Kearns, Schapire
- 1994
(Show Context)
Citation Context ...hesis class may be more powerful than the target class. Next, if A is the p-concept decomposition using a class 5 c of p-concepts, 7- = 5 c, and 7-{ _D F, then we obtain the p-concept learning model (=-=Kearns & Schapire, 1990-=-), and there are at least two interesting choices of loss functions. If we choose the prediction loss function Z then we ask for the optimal predictive model for the {0, 1} observations (also known as... |

191 | Computational limitations on learning from examples
- Pitt, Valiant
- 1988
(Show Context)
Citation Context .... A significant portion of the research described in this paper extends this work. Some of the results presented are also closely related to the work of Pitt and Valiant on heuristic learning (Pitt & =-=Valiant, 1988-=-; Valiant, 1985), which can be viewed as a variant of our agnostic PAC model. The following is a brief overview of the paper: in Section 2 we motivate and develop in detail the general learning framew... |

167 | On the learnability of Boolean formulae
- Kearns, Li, et al.
- 1987
(Show Context)
Citation Context ...en we obtain the restricted PAC model (Valiant, 1984), where the hypothesis class is the same as the target class. If we retain the condition T = F but allow H ' F , we obtain the standard PAC model (=-=Kearns et al., 1987-=-), where the hypothesis class may be more powerful than the target class. Next, if A is the p-concept decomposition using a class F of p-concepts, T = F , and H ' F , then we obtain the p-concept lear... |

165 | Learning in the presence of malicious errors
- Kearns, Li
- 1988
(Show Context)
Citation Context ...iant (1988) for a model of heuristic learning. 3.1. Agnostic learning and malicious errors Our first result shows that agnostic PAC learning is at least as hard as PAC learning with malicious errors (=-=Kearns & Li, 1993-=-; Valiant, 1985) (and in fact, a partial converse holds as well). Although we will not formally define the latter model, it is equivalent to the standard PAC model with the addition of a new parameter... |

144 |
Learning in artificial neural networks: A statistical perspective. Neural computation
- White
- 1989
(Show Context)
Citation Context ...tic loss function Q. Here it is known that the quadratic loss will lead us to find a hypothesis h minimizing the quadratic distance between f and h, i.e., E[(f \Gamma h) 2 ] (Kearns & Schapire, 1990; =-=White, 1989-=-). Now consider the following generalization of the standard PAC model: let F be the class of all boolean functions over the domain X, and let A be the functional decomposition using F . Thus we remov... |

137 |
Central limit theorems for empirical measures
- Dudley
- 1978
(Show Context)
Citation Context ...g uniform convergence. Specifically, we will be interested in the pseudo dimension of a class of functions F , a combinatorial property of F that largely characterizes the uniform convergence over F (=-=Dudley, 1978-=-; Haussler, 1992; Pollard, 1984). Let F be a class of functions f : X ! R, and let S = f(x 1 ; y 1 ); : : : ; (x d ; y d )g be a finite subset of X \Theta R. We say that F shatters S if f0; 1g d = fhp... |

117 |
Learning disjunction of conjunctions
- Valiant
- 1985
(Show Context)
Citation Context ... portion of the research described in this paper extends this work. Some of the results presented are also closely related to the work of Pitt and Valiant on heuristic learning (Pitt & Valiant, 1988; =-=Valiant, 1985-=-), which can be viewed as a variant of our agnostic PAC model. The following is a brief overview of the paper: in Section 2 we motivate and develop in detail the general learning framework we will use... |

93 | Recent developments in nonparametric density estimation - Izenman - 1991 |

67 | Tracking drifting concepts by minimizing disagreements
- Helmbold, Long
- 1994
(Show Context)
Citation Context ... the examples seen by a learning algorithm, another worthwhile research direction that116 M.J. KEARNS, R.E. SCHAPIRE AND L.M. SELLIE has been pursued by a number of authors (Aldous & Vazirani, 1990; =-=Helmbold & Long, 1994-=-). This paper describes a preliminary study of the possibilities and limitations for efficient agnostic learning. As such, we do not claim to have a definitive model but instead use a rather general m... |

50 | An improved boosting algorithm and its implications on learning complexity - Freund - 1992 |

46 | Density estimation by stochastic complexity - Rissanen, Speed, et al. - 1992 |

39 | Probability inequalities for sums of bounded random variables - ding - 1994 |

29 | A Markovian extension of Valiantâ€™s learning model
- Aldous, Vazirani
- 1990
(Show Context)
Citation Context ...ical independence between the examples seen by a learning algorithm, another worthwhile research direction that116 M.J. KEARNS, R.E. SCHAPIRE AND L.M. SELLIE has been pursued by a number of authors (=-=Aldous & Vazirani, 1990-=-; Helmbold & Long, 1994). This paper describes a preliminary study of the possibilities and limitations for efficient agnostic learning. As such, we do not claim to have a definitive model but instead... |

27 | A learning criterion for stochastic rules - Yamanishi - 1992 |

18 | Learning switching concepts - Blum, Chalasani - 1992 |

17 | E cient distribution-free learning of probabilisitic concepts - Kearns, Shapire - 1994 |

14 | Learning in Arti cial Neural Networks: A Statistical Perspective - White - 1989 |

1 | Boosting a weak learning algorithm by majority - KEARNS, SCHAPIRE, et al. - 1990 |

1 | Learning nonparametric densities in terms of finite dimensional parametric hypotheses - Yamanishi - 1992 |