#### DMCA

## Ultraconservative Online Algorithms for Multiclass Problems (2001)

### Cached

### Download Links

- [www.cs.huji.ac.il]
- [www.jmlr.org]
- [www.jmlr.org]
- [www.ai.mit.edu]
- [jmlr.csail.mit.edu]
- [www.cis.upenn.edu]
- [www.ai.mit.edu]
- [jmlr.org]
- [www.seas.upenn.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | Journal of Machine Learning Research |

Citations: | 320 - 21 self |

### Citations

13211 | The Nature of Statistical Learning Theory
- Vapnik
- 1995
(Show Context)
Citation Context ...ssion put emphasis on smooth loss functions which might not be suitable for classication problems. The idea of seeking a hyperplane of a small norm is a primary goal in support vector machines (SVM) [=-=4, 18]-=-. Algorithms for constructing support vector machines solve optimization problems with a quadratic objective function and linear constraints. The work in [2, 9] suggests to minimize the objective func... |

6599 |
C4.5: Programs for machine learning
- Quinlan
- 1993
(Show Context)
Citation Context ...racter recognition (OCR), text classification, and medical analysis. There are numerous specialized solutions for multiclass problems for specific models such as decision trees (Breiman et al., 1984, =-=Quinlan, 1993-=-) and neural networks. Another general approach is based on reducing a multiclass problem to multiple binary problems using output coding (Dietterich and Bakiri, 1995, Allwein et al., 2000). An exampl... |

5962 |
Classification and Regression Trees
- Breiman, Friedman, et al.
- 1984
(Show Context)
Citation Context ...blems include optical character recognition (OCR), text classication, and medical analysis. There are numerous specialized solutions for multiclass problems for specic models such as decision trees [3=-=, 16]-=- and neural networks. Another general approach is based on reducing a multiclass problem to multiple binary problems using output coding [6, 1]. An example of a reduction that falls into the above fra... |

4841 |
Pattern classification and scene analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...lation to Kesler's Construction Before turning to a more complex multiclass version, we would like to discuss the relation of the family of updates described in this section to Kesler's construction (=-=Duda and Hart, 1973-=-). Kesler's construction is attributed to Carl Kesler and was described by Nilsson (1965). The construction reduces a multiclass classification problem to a binary problem by expanding each instance i... |

3701 | Support-vector networks
- Cortes, Vapnik
- 1995
(Show Context)
Citation Context ...ssion put emphasis on smooth loss functions which might not be suitable for classication problems. The idea of seeking a hyperplane of a small norm is a primary goal in support vector machines (SVM) [=-=4, 18]-=-. Algorithms for constructing support vector machines solve optimization problems with a quadratic objective function and linear constraints. The work in [2, 9] suggests to minimize the objective func... |

1510 |
Fast training support vector machines using sequential minimal optimization
- Platt
- 1998
(Show Context)
Citation Context ...nt-decent method, which can be performed by going over the sample sequentially. Algorithms with a similar approach include the Sequential Minimization Optimization (SMO) algorithm introduced by Platt =-=[15]-=-. SMO works on rounds, on each round it chooses two examples of the sample and minimizes the objective function by modifying variables relevant only to these two examples. While these algorithms share... |

1143 | The perceptron: A probabilistic model for information storage and organization in the brain
- Rosenblatt
- 1958
(Show Context)
Citation Context ...and, we do not want to change the current classier too radically, especially if it classies well most of the previously observed instances. The good old perceptron algorithm suggested by Rosenblatt [1=-=7-=-] copes with these two requirements by replacing the classier with a linear combination of the current hyperplane and the current instance vector. Although the algorithm uses a simple update rule, it ... |

721 | Solving multiclass learning problems via errorcorrecting output codes
- Dietterich, Bakiri
- 1995
(Show Context)
Citation Context ...lass problems for specic models such as decision trees [3, 16] and neural networks. Another general approach is based on reducing a multiclass problem to multiple binary problems using output coding [=-=6, 1]. A-=-n example of a reduction that falls into the above framework is the \one-against-rest" approach. In one-against-rest a set of binary classiers is trained, one classier for each class. The ith cla... |

561 | Reducing multiclass to binary: a unifying approach for margin classifiers
- Allwein, Schapire, et al.
- 2000
(Show Context)
Citation Context ...lass problems for specic models such as decision trees [3, 16] and neural networks. Another general approach is based on reducing a multiclass problem to multiple binary problems using output coding [=-=6, 1]. A-=-n example of a reduction that falls into the above framework is the \one-against-rest" approach. In one-against-rest a set of binary classiers is trained, one classier for each class. The ith cla... |

556 | On the algorithmic implementation of multiclass kernel-based vector machines - Crammer, Singer - 2001 |

521 | Large margin classification using the perceptron algorithm
- Freund, Schapire
- 1999
(Show Context)
Citation Context ... K( , ) that satisfies Mercer's conditions (Vapnik, 1998). We now obtain algorithms that work in a high dimensional space. It is also simple to incorporate voting schemes (Helmbold and Warmuth, 1995, =-=Freund and Schapire, 1999-=-) into the above algorithms. Before proceeding to multiplicative algorithms, let us summarize the the results we have presented so far. We started with the Perceptron algorithm and extended it to mult... |

231 |
Pattern Classi and Scene Analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...forthcoming long version of this paper. Related Work Multiclass extensions to binary approaches by maintaining multiple prototypes are by no means new. The widely read and cited book by Duda and Hart =-=[7]-=- describes a multiclass extension to the perceptron that employs multiple vectors. However, direct methods for online learning of multiclass problems in the mistake bound model have received relativel... |

228 | On the learnability and design of output codes for multiclass problems,”
- Crammer, Singer
- 2002
(Show Context)
Citation Context ...Then, P T t=1 k t k 1 4R 2 =s2 . The proof is omitted due to lack of space. 5.1 Characteristics of the solution Let us now further examine the characteristics of the solution obtained by MIRA. In [5] we investigated a related setting that uses error correcting output codes for multiclass problems. Using the results from [5] it is simple to show that the optimal in Eq. (14) is given by r = m... |

145 |
Additive versus exponentiated gradient updates for linear prediction
- Kivinen, Warmuth
- 1995
(Show Context)
Citation Context .... The perceptron algorithm spurred voluminous work which clearly cannot be covered here. For an overview of numerous additive and multiplicative online algorithms see the paper by Kivinen and Warmuth =-=[12]-=-. We outline below some of the research that is more relevant to the work presented in this paper. Kivinen and Warmuth [12] presented numerous online algorithms for regression. Their algorithms are ba... |

112 |
Learning when irrelevant attributes abound.
- Littlestone
- 1987
(Show Context)
Citation Context ...thods [18]. An interesting direction we are currently working on is combining our framework with other online learning algorithms for binary problems. Specically, we have been able to combine Winnow [=-=14]-=- and Li and Long's ROMMA algorithm [13] with our framework, and to construct a multiclass version for those algorithms. A question that remains open is how to impose constraints similar to the one MIR... |

103 | A new approximate maximal margin classification algorithm. - Gentile - 2001 |

95 | General convergence results for linear discriminant updates.
- Grove, Littlestone, et al.
- 2001
(Show Context)
Citation Context ...nds that the hyperplane will classify correctly the current new instance. Solving this minimization problem leads to an additive update rule with adaptive coecients. Grove, Littlestone and Schuurmans =-=[11]-=- introduced a general framework of quasi-additive binary algorithms, which contain the perceptron and Winnow as special cases. In [10] Gentile proposed an extension to a subset of the quasiadditive al... |

85 | The relaxed online maximum margin algorithm.
- Li, Long
- 2002
(Show Context)
Citation Context ...problems and were not analyzed in the mistake bound model. Another approach to the problem of designing an update rule which results in a linear classier of a small norm was suggested by Li and Long [=-=13-=-]. The algorithm Li and Long proposed, called ROMMA, tackles the problem bysnding a hyperplane with a minimal norm under two linear constraints. Thesrst constraint is presented so that the new classie... |

64 |
The AdaTron: an adaptive perceptron algorithm,”
- Anlauf, Biehl
- 1989
(Show Context)
Citation Context ... goal in support vector machines (SVM) [4, 18]. Algorithms for constructing support vector machines solve optimization problems with a quadratic objective function and linear constraints. The work in =-=[2, 9]-=- suggests to minimize the objective function in a gradient-decent method, which can be performed by going over the sample sequentially. Algorithms with a similar approach include the Sequential Minimi... |

52 | On Weak Learning.
- Helmbold, Warmuth
- 1995
(Show Context)
Citation Context ...general inner-product kernel K( , ) that satisfies Mercer's conditions (Vapnik, 1998). We now obtain algorithms that work in a high dimensional space. It is also simple to incorporate voting schemes (=-=Helmbold and Warmuth, 1995-=-, Freund and Schapire, 1999) into the above algorithms. Before proceeding to multiplicative algorithms, let us summarize the the results we have presented so far. We started with the Perceptron algori... |

30 |
The Kernel-Adatron: A fast and simple learning procedure for Support Vector Machines
- Friess, Cristianini, et al.
- 1998
(Show Context)
Citation Context ... goal in support vector machines (SVM) [4, 18]. Algorithms for constructing support vector machines solve optimization problems with a quadratic objective function and linear constraints. The work in =-=[2, 9]-=- suggests to minimize the objective function in a gradient-decent method, which can be performed by going over the sample sequentially. Algorithms with a similar approach include the Sequential Minimi... |

19 |
Large Margin Classi Using the Perceptron Algorithm
- Freund, Schapire
- 1998
(Show Context)
Citation Context ...ound of the above theorem reduces to the perceptrons mistake bound in the binary case (k = 2). To conclude this section we analyze the non-separable case by generalizing Thm. 2 of Freund and Schapire =-=-=-[8] to a multiclass setting. Theorem 2. Let (x 1 ; y 1 ); : : : ; (x T ; y T ) be an input sequence for any multiclass algorithm from the family described in Fig. 2, where x t 2 R n and y t 2 f1; 2; ... |

10 | A multi-class linear learning algorithm related to Winnow. In - Mesterharm - 2000 |

2 |
Approximate maximal margin classi with respect to an arbitrary norm
- Gentile
- 2000
(Show Context)
Citation Context ...e rule with adaptive coecients. Grove, Littlestone and Schuurmans [11] introduced a general framework of quasi-additive binary algorithms, which contain the perceptron and Winnow as special cases. In =-=[10]-=- Gentile proposed an extension to a subset of the quasiadditive algorithms, which uses an additive conservative update rule with decreasing learning rates. The algorithms presented in this paper are r... |