## The Perceptron algorithm vs. Winnow: linear vs. logarithmic mistake bounds when few input variables are relevant

### Cached

### Download Links

- [www.cse.ucsc.edu]
- [ftp.cse.ucsc.edu]
- [ftp.cse.ucsc.edu]
- [www.cse.ucsc.edu]
- DBLP

### Other Repositories/Bibliography

Citations: | 59 - 9 self |

### BibTeX

@MISC{Kivinen_theperceptron,

author = {Jyrki Kivinen and Manfred K. Warmuth},

title = {The Perceptron algorithm vs. Winnow: linear vs. logarithmic mistake bounds when few input variables are relevant},

year = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

This paper addresses the familiar problem of predicting with a linear classifier . The

### Citations

4195 |
Pattern Classification and Scene Analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...mal mistake bound. The best upper bound we know for learning k-literal monotone disjunctions with the Perceptron algorithm is O(kN ) mistakes. This bound comes from the Perceptron Convergence Theorem =-=[DH73]-=-, and we suspect it is not very tight for our case, particularly when k is large. The main result of this paper is to give a simple adversary strategy that forces the Perceptron algorithm to make N \G... |

1759 | A theory of the learnable
- Valiant
- 1984
(Show Context)
Citation Context ...es is advantageous for the Perceptron algorithm. Note that if it is known that k is close to N , Winnow can be tuned so that it simulates the classical elimination algorithm for learning disjunctions =-=[Val84]-=-, in which case it makes at most N \Gamma k mistakes for k literal monotone disjunctions but is not robust against noise. The trade-off in which Winnow is able to take advantage of sparse targets and ... |

983 | On the uniform convergence of relative frequencies of events to their probabilities. Theory of Prob - Vapnik, Chervonenkis - 1971 |

892 | The perceptron: A probabilistic model for information storage and organization in the brain
- Rosenblatt
- 1958
(Show Context)
Citation Context ...istakes that the learning algorithm makes for certain sequences of trials. The standard on-line algorithm for learning with linear threshold functions is the simple Perceptron algorithm of Rosenblatt =-=[Ros58]-=-. An alternate algorithm called Winnow was introduced by Littlestone [Lit89, Lit88]. To see how the algorithms work, consider a binary vectorsx t 2 f0; 1 g N as an instance, and assume that the algori... |

695 | Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm
- Littlestone
- 1988
(Show Context)
Citation Context ... obtainable. This upper bound is optimal to within a constant factor since the Vapnik-Chervonenkis (VC) dimension [VC71, BEHW89] of the class of k-literal disjunctions is \Omega\Gamma k + k log(N=k)) =-=[Lit88]-=- and this dimension is always a lower bound for the optimal mistake bound. The best upper bound we know for learning k-literal monotone disjunctions with the Perceptron algorithm is O(kN ) mistakes. T... |

646 | Learnability and the Vapnik–Chervonenkis dimension - Blumer, Ehrenfeucht, et al. - 1989 |

399 |
A polynomial algorithm in linear programming, Dokl. Akad
- Khachiyan
(Show Context)
Citation Context ...ur'an [MT94] have pointed out, several linear programming methods can be transformed into efficient linear on-line prediction algorithms. Most notably, this applies to Khachiyan's ellipsoid algorithm =-=[Kha79]-=- and to a newer algorithm due to Vaidya [Vai89]. Vaidya's algorithm achieves an upper bound of O(N 2 log N ) mistakes for an arbitrary linear classifier as the target when the instances are from f 0; ... |

328 | How to use expert advice - Cesa-Bianchi, Freund, et al. - 1997 |

269 | Aggregating strategies - Vovk - 1990 |

260 | Exponentiated Gradient Versus Gradient Descent for Linear Predictors
- Kivinen, Warmuth
- 1997
(Show Context)
Citation Context ...e advantage of sparse targets and dense instances and the Perceptron algorithm is able to take advantage of sparse instances and dense targets is similar to the situation in on-line linear regression =-=[KW94]-=-. In the regression problem, the classical Gradient Descent algorithm makes Perceptronstyle additive updates, and a new family of Exponentiated Gradient algorithms makes multiplicative Winnowstyle upd... |

122 |
A New Algorithm for Minimizing Convex Functions Over Convex Sets
- Vaidya
- 1996
(Show Context)
Citation Context ...rogramming methods can be transformed into efficient linear on-line prediction algorithms. Most notably, this applies to Khachiyan's ellipsoid algorithm [Kha79] and to a newer algorithm due to Vaidya =-=[Vai89]-=-. Vaidya's algorithm achieves an upper bound of O(N 2 log N ) mistakes for an arbitrary linear classifier as the target when the instances are from f 0; 1 g N . The Perceptron algorithm and Winnow are... |

110 |
Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms
- Littlestone
- 1989
(Show Context)
Citation Context ...m Winnow2 of [Lit88]) only uses positive weights (assuming that the initial weights are positive). The algorithm can be generalized for negative weights by a simple reduction [Lit88]. See Littlestone =-=[Lit89]-=- for a discussion on the learning rates and other parameters of Winnow. Here we just point out the standard method of allowing the threshold to be fixed to 0 at the cost of increasing the dimensionali... |

107 |
Redundant noisy attributes, attribute errors, and linear-threshold learning using winnow
- Littlestone
- 1991
(Show Context)
Citation Context ...and ` t = N=k for all t. With this tuning Winnow is guaranteed to make at most O(k+log(N=k)) mistakes if the target is a k-literal monotone disjunction, and the algorithm is even robust against noise =-=[Lit91]-=-. For the Perceptron algorithm we have used zero initial weights and eliminated the threshold by the transformation given in Section 2. In this case the choice of the learning rate of the Perceptron a... |

78 | Tracking the best disjunction - Auer, Warmuth - 1998 |

76 | Predicting nearly as well as the best pruning of a decision tree - Helmbold, Schapire - 1995 |

60 | Ef£cient learning with virtual threshold gates - Maass, Warmuth - 1995 |

44 | On-line learning of linear functions
- Littlestone, Long, et al.
- 1991
(Show Context)
Citation Context ...e multiple of the current instance x t , the result is that the predictions of the Perceptron algorithm are independent of the preceding trials. Similar proofs have been devised for linear regression =-=[LLW91]-=-. The proofs presented here for the case of linear thresholding are slightly more involved. We also wish to point out that the behavior similar to that predicted by the worst-case lower bounds already... |

36 |
How fast can a threshold gate learn
- Maass, Turán
- 1994
(Show Context)
Citation Context ...ng that the target is a monotone k-literal disjunctions and the instances x t 2 f0; 1 g N satisfy P i x t;isX for some value X, the bound is O(kX) mistakes. Note that always XsN . As Maass and Tur'an =-=[MT94]-=- have pointed out, several linear programming methods can be transformed into efficient linear on-line prediction algorithms. Most notably, this applies to Khachiyan's ellipsoid algorithm [Kha79] and ... |

35 | The statistical mechanics of learning a rule - Watkin, Rau, et al. - 1993 |

21 |
Comparing several linear-threshold learning algorithms on tasks involving superOuous attributes
- Littlestone
- 1995
(Show Context)
Citation Context ...rts the Perceptron algorithm. A more extensive experimental comparison of various on-line algorithms for learning disjunctions in the presence of attribute noise has recently been done by Littlestone =-=[Lit95]-=-. Bounds on the worst-case number of mistakes have earlier been obtained for both the Perceptron algorithm and Winnow. Both of these upper bound proofs can be interpreted as using amortized analysis w... |

19 | Adaptive mixtures of probabilistic transducers - Singer - 1996 |

10 | Learning curves in large neural networks - Sompolinsky, Seung, et al. - 1991 |

10 | The statistical mechanics of learning a rule, Rev - Watkin, Rau, et al. - 1993 |

6 | How fast can a threshold gate learn. In Computational learning theory and natural learning system: Constraints and prospects (pp. 381–414 - Maass, Turan - 1994 |