## Large Margin Classification Using the Perceptron Algorithm (1998)

### Cached

### Download Links

- [www.research.att.com]
- [oucsace.cs.ohiou.edu]
- [oucsace.cs.ohiou.edu]
- [oucsace.cs.ohiou.edu]
- CiteULike
- DBLP

### Other Repositories/Bibliography

Venue: | Machine Learning |

Citations: | 413 - 1 self |

### BibTeX

@INPROCEEDINGS{Freund98largemargin,

author = {Yoav Freund and Robert E. Schapire},

title = {Large Margin Classification Using the Perceptron Algorithm},

booktitle = {Machine Learning},

year = {1998},

pages = {277--296}

}

### Years of Citing Articles

### OpenURL

### Abstract

We introduce and analyze a new algorithm for linear classification which combines Rosenblatt 's perceptron algorithm with Helmbold and Warmuth's leave-one-out method. Like Vapnik 's maximal-margin classifier, our algorithm takes advantage of data that are linearly separable with large margins. Compared to Vapnik's algorithm, however, ours is much simpler to implement, and much more efficient in terms of computation time. We also show that our algorithm can be efficiently used in very high dimensional spaces using kernel functions. We performed some experiments using our algorithm, and some variants of it, for classifying images of handwritten digits. The performance of our algorithm is close to, but not as good as, the performance of maximal-margin classifiers on the same problem, while saving significantly on computation time and programming effort. 1 Introduction One of the most influential developments in the theory of machine learning in the last few years is Vapnik's work on supp...

### Citations

8973 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...h. This gives us some indication that running the voted-perceptron algorithm with T = 1 might be better than running it to convergence; however, our experiments do not support this prediction. Vapnik =-=[20]-=- also gives a very similar bound for the expected error of support-vector machines. There are two differences between the bounds. First, the set of vectors on which the perceptron makes a mistake is r... |

2168 | Support-Vector Networks
- Cortes, Vapnik
- 1995
(Show Context)
Citation Context ...the new algorithm are almost identical to the bounds for SVM's given by Vapnik and Chervonenkis [19] in the linearly separable case. We repeated some of the experiments performed by Cortes and Vapnik =-=[6]-=- on the use of SVM on the problem of classifying handwritten digits. We tested both the voted-perceptron algorithm and a variant based on averaging rather than voting. These experiments indicate that ... |

1289 | A training algorithm for optimal margin classifiers
- Boser, Guyon, et al.
- 1992
(Show Context)
Citation Context ...osed by suggested Aizerman, Braverman and Rozonoer [1] who specifically described a method for combining kernal functions with the perceptron algorithm. Continuing their work, Boser, Guyon and Vapnik =-=[4]-=- suggested using kernel functions with SVM's. Kernel functions are functions of two variables K(x; y) which can be represented as an inner product \Phi(x) \Delta \Phi(y) for some function \Phi : R n !... |

802 |
Estimation of Dependencies Based on Empirical Data
- Vapnik
- 1982
(Show Context)
Citation Context ...putation time and programming effort. 1 Introduction One of the most influential developments in the theory of machine learning in the last few years is Vapnik's work on support vector machines (SVM) =-=[18]-=-. Vapnik's analysis suggests the following simple method for learning complex binary classifiers. First, use some fixed mapping \Phi to map the instances into some very high dimensional space in which... |

786 |
The perceptron: A probabilistic model for information storage and organization in the brain
- Rosenblatt
- 1958
(Show Context)
Citation Context ...dvantage of data that are linearly separable with large margins. We named the new algorithm the voted-perceptron algorithm. The algorithm is based on the well known perceptron algorithm of Rosenblatt =-=[16, 17]-=- and a transformation of online learning algorithms to batch learning algorithms developed by Helmbold and Warmuth [9]. Moreover, following the work of Aizerman, Braverman and Rozonoer [1], we show th... |

514 |
Perceptron: An Introduction to Computational Geometry
- Minsky, Papert
- 1969
(Show Context)
Citation Context ...l it finds a prediction vector which is correct on all of the training set. This prediction rule is then used for predicting the labels on the test set. Block [3], Novikoff [15] and Minsky and Papert =-=[14]-=- have shown that if the data are linearly separable, then the perceptron algorithm will make a finite number of mistakes, and therefore, if repeatedly cycled through the training set, will converge to... |

314 | How to use expert advice
- Cesa-Bianchi, Freund, et al.
- 1997
(Show Context)
Citation Context ... to focus primarily on deterministic voting rather than randomization. The following theorem follows directly from Helmbold and Warmuth [9]. (See also Kivinen and Warmuth [10] and Cesa-Bianchi et al. =-=[5]-=-.) Theorem 3 Assume all examples (x; y) are generated i.i.d. Let E be the expected number of mistakes that the online algorithm A makes on a randomly generated sequence of m+1 examples. Then given m r... |

286 |
Principles of Neurodynamics
- Rosenblatt
- 1962
(Show Context)
Citation Context ...dvantage of data that are linearly separable with large margins. We named the new algorithm the voted-perceptron algorithm. The algorithm is based on the well known perceptron algorithm of Rosenblatt =-=[16, 17]-=- and a transformation of online learning algorithms to batch learning algorithms developed by Helmbold and Warmuth [9]. Moreover, following the work of Aizerman, Braverman and Rozonoer [1], we show th... |

282 |
Theoretical foundations of the potential function method in pattern recognition learning,” Automation and Remote Control
- Aizerman, Braverman, et al.
- 1964
(Show Context)
Citation Context ...nblatt [16, 17] and a transformation of online learning algorithms to batch learning algorithms developed by Helmbold and Warmuth [9]. Moreover, following the work of Aizerman, Braverman and Rozonoer =-=[1]-=-, we show that kernel functions can be used with our algorithm so that we can run our algorithm efficiently in very high dimensional spaces. Our algorithm and its analysis involve little more than com... |

134 |
Additive versus exponentiated gradient updates for linear prediction
- Kivinen, Warmuth
- 1997
(Show Context)
Citation Context ...g it. In this work we decided to focus primarily on deterministic voting rather than randomization. The following theorem follows directly from Helmbold and Warmuth [9]. (See also Kivinen and Warmuth =-=[10]-=- and Cesa-Bianchi et al. [5].) Theorem 3 Assume all examples (x; y) are generated i.i.d. Let E be the expected number of mistakes that the online algorithm A makes on a randomly generated sequence of ... |

131 |
On convergence proofs on perceptrons
- Novikoff
- 1962
(Show Context)
Citation Context ...rough the training set until it finds a prediction vector which is correct on all of the training set. This prediction rule is then used for predicting the labels on the test set. Block [3], Novikoff =-=[15]-=- and Minsky and Papert [14] have shown that if the data are linearly separable, then the perceptron algorithm will make a finite number of mistakes, and therefore, if repeatedly cycled through the tra... |

100 |
Theory of Pattern Recognition
- Vapnik, Chervonenkis
- 1974
(Show Context)
Citation Context ...s very simple and easy to implement, and the theoretical bounds on the expected generalization error of the new algorithm are almost identical to the bounds for SVM's given by Vapnik and Chervonenkis =-=[19]-=- in the linearly separable case. We repeated some of the experiments performed by Cortes and Vapnik [6] on the use of SVM on the problem of classifying handwritten digits. We tested both the voted-per... |

87 | Comparison of learning algorithms for handwritten digit recognition
- LeCun, Jackel, et al.
- 1995
(Show Context)
Citation Context ...osely the experimental setup used by Cortes and Vapnik [6] in their experiments on the NIST OCR database. 2 We chose to use this setup because the dataset is widely available and because LeCun et al. =-=[12]-=- have published a detailed comparison of the performance of some of the best digit classification systems in this setup. Examples in this NIST database consist of labeled digital images of individual ... |

85 |
From on-line to batch learning
- Littlestone
(Show Context)
Citation Context ...le that has survived for a long time is likely to be better than one that has only survived for a few iterations. This method was suggested by Gallant [8] who called it the pocket method. Littlestone =-=[13]-=-, suggested a two-phase method in which the performance of all of the rules is tested on a seperate test set and the rule with the least error is then used. Here we use a different method for converti... |

78 |
The Perceptron: A model for brain functioning
- Block
- 1962
(Show Context)
Citation Context ... repeatedly through the training set until it finds a prediction vector which is correct on all of the training set. This prediction rule is then used for predicting the labels on the test set. Block =-=[3]-=-, Novikoff [15] and Minsky and Papert [14] have shown that if the data are linearly separable, then the perceptron algorithm will make a finite number of mistakes, and therefore, if repeatedly cycled ... |

54 |
The adatron: an adaptive perceptron algorithm
- Anlauf, Biehl
- 1989
(Show Context)
Citation Context ...level of performance). Recently, Friess, Cristianini and Campbell [7] have experimented with a different online learning algorithm called the adatron. This algorithm was suggested by Anlauf and Biehl =-=[2] as a meth-=-od for calculating the largest margin classifier (also called the "maximally stable perceptron"). They proved that their algorithm converges asymptotically to the correct solution. Our paper... |

50 | On weak learning
- Helmbold, Warmuth
- 1995
(Show Context)
Citation Context ... The algorithm is based on the well known perceptron algorithm of Rosenblatt [16, 17] and a transformation of online learning algorithms to batch learning algorithms developed by Helmbold and Warmuth =-=[9]-=-. Moreover, following the work of Aizerman, Braverman and Rozonoer [1], we show that kernel functions can be used with our algorithm so that we can run our algorithm efficiently in very high dimension... |

37 |
Optimal linear discriminants
- Gallant
- 1986
(Show Context)
Citation Context ...ngest time before it was changed. A prediction rule that has survived for a long time is likely to be better than one that has only survived for a few iterations. This method was suggested by Gallant =-=[8]-=- who called it the pocket method. Littlestone [13], suggested a two-phase method in which the performance of all of the rules is tested on a seperate test set and the rule with the least error is then... |

30 |
The Kernel Adatron: a Fast and Simple Learning Procedure for Support Vector
- Frie, Cristianini, et al.
- 1998
(Show Context)
Citation Context ...forms better than the traditional way of using the perceptron algorithm (although all methods converge eventually to roughly the same level of performance). Recently, Friess, Cristianini and Campbell =-=[7]-=- have experimented with a different online learning algorithm called the adatron. This algorithm was suggested by Anlauf and Biehl [2] as a method for calculating the largest margin classifier (also c... |

17 | From noise-free to noise-tolerant and from on-line to batch learning - Klasner, Simon - 1995 |