This paper surveys some recent theoretical results on the efficiency of machine learning algorithms. The main tool described is the notion of Probably Approximately Correct (PAC) learning, introduced by Valiant. We define this learning model and then look at some of the results obtained in it. We then consider some criticisms of the PAC model and the extensions proposed to address these criticisms. Finally, we look briefly at other models recently proposed in computational learning theory. 2 Introduction It's a dangerous thing to try to formalize an enterprise as complex and varied as machine learning so that it can be subjected to rigorous mathematical analysis. To be tractable, a formal model must be simple. Thus, inevitably, most people will feel that important aspects of the activity have been left out of the theory. Of course, they will be right. Therefore, it is not advisable to present a theory of machine learning as having reduced the entire field to its bare essentials. All ...
|
3011
|
Pattern Classification and Scene Analysis
– Duda, Hart
- 1973
|
|
1328
|
A theory of the learnable
– Valiant
- 1984
|
|
624
|
Estimation of Dependences Based on Empirical Data
– Vapnik
- 1982
|
|
525
|
Learnability and the Vapnik-Chervonenkis dimension
– Blumer, Ehrenfeucht, et al.
- 1989
|
|
499
|
Learning quickly when irrelevant attributes abound: A new linearthreshold algorithm
– Littlestone
- 1988
|
|
498
|
Queries and concept learning
– Angluin
- 1988
|
|
438
|
The weighted majority algorithm
– Littlestone, Warmuth
- 1994
|
|
365
|
Learning Regular Sets from Queries and Counterexamples
– Angluin
- 1987
|
|
310
|
Learning decision lists
– Rivest
- 1987
|
|
242
|
Cryptographic limitations on learning boolean formulae and finite automata
– Kearns, Valiant
- 1994
|
|
207
|
Quantifying inductive bias: AI learning algorithms and Valiant's learning framework
– Haussler
- 1988
|
|
179
|
Learning from noisy examples
– Angluin, Laird
- 1988
|
|
169
|
Computational limitations on learning from examples
– Pitt, Valiant
- 1988
|
|
168
|
Efficient distribution-free learning of probabilistic concepts
– Kearns, Schapire
- 1990
|
|
154
|
The Need for Biases in Learning Generalizations
– Mitchell
- 1980
|
|
153
|
Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition
– Cover
- 1965
|
|
140
|
On the learnability of Boolean formulae
– Kearns, Li, et al.
- 1987
|
|
123
|
Learning in the presence of malicious errors
– Kearns, Li
- 1988
|
|
106
|
Learning disjunctions of conjunctions
– Valiant
- 1985
|
|
94
|
Learning read-once formulas with queries
– Angluin, Hellerstein, et al.
- 1993
|
|
93
|
Learning conjunctions of Horn clauses
– Angluin, Frazier, et al.
- 1992
|
|
91
|
Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms
– Littlestone
- 1989
|
|
83
|
Equivalence of models for polynomial learnability
– Haussler, Kearns, et al.
- 1991
|
|
74
|
A Theory of Learning Classification Rules
– Buntine
- 1990
|
|
64
|
Inductive inference, DFAs, and computational complexity
– Pitt
- 1989
|
|
62
|
Learning conjunctive concepts in structural domains
– Haussler
- 1989
|
|
59
|
From on-line to batch learning
– Littlestone
- 1989
|
|
39
|
On learning sets and functions
– Natarajan
- 1989
|
|
38
|
Learnability by fixed distributions
– Benedek, Itai
- 1988
|
|
32
|
Linear Function Neurons: Structure and Training
– Hampson, Volper
- 1986
|
|
32
|
Predicting 0,1-functions on randomly drawn points
– Haussler, Littlestone, et al.
- 1990
|
|
19
|
Bounding sample size with the VapnikChervonenkis dimension
– Shawe-Taylor, Anthony, et al.
- 1993
|
|
18
|
Generalizing the PAC model for neural net and other learning applications
– Haussler
- 1989
|
|
7
|
Average case analysis of empirical and explanation-based learning algorithms
– Sarrett, Pazzani
- 1989
|
|
4
|
Training a three-neuron neural net is NP-complete
– Blum, Rivest
- 1988
|
|
3
|
The Valiant Learning Model: Extensions and Assessment
– Amsterdam
- 1988
|
|
2
|
When are k-nearest neighbor and back propogation accurate for feasible sized sets of examples
– Baum
- 1990
|
|
2
|
On the error probabilty of boolean concept descriptions
– Bergadano, Saitta
- 1989
|
|
1
|
Experimental tests of statistical learning theories
– Tesauro, Cohn
- 1990
|