Results 1  10
of
303
A Comparison of Methods for Multiclass Support Vector Machines
 IEEE TRANS. NEURAL NETWORKS
, 2002
"... Support vector machines (SVMs) were originally designed for binary classification. How to effectively extend it for multiclass classification is still an ongoing research issue. Several methods have been proposed where typically we construct a multiclass classifier by combining several binary class ..."
Abstract

Cited by 825 (21 self)
 Add to MetaCart
Support vector machines (SVMs) were originally designed for binary classification. How to effectively extend it for multiclass classification is still an ongoing research issue. Several methods have been proposed where typically we construct a multiclass classifier by combining several binary classifiers. Some authors also proposed methods that consider all classes at once. As it is computationally more expensive to solve multiclass problems, comparisons of these methods using largescale problems have not been seriously conducted. Especially for methods solving multiclass SVM in one step, a much larger optimization problem is required so up to now experiments are limited to small data sets. In this paper we give decomposition implementations for two such “alltogether” methods. We then compare their performance with three methods based on binary classifications: “oneagainstall,” “oneagainstone,” and directed acyclic graph SVM (DAGSVM). Our experiments indicate that the “oneagainstone” and DAG methods are more suitable for practical use than the other methods. Results also show that for large problems methods by considering all data at once in general need fewer support vectors.
On the algorithmic implementation of multiclass kernelbased vector machines
 Journal of Machine Learning Research
"... In this paper we describe the algorithmic implementation of multiclass kernelbased vector machines. Our starting point is a generalized notion of the margin to multiclass problems. Using this notion we cast multiclass categorization problems as a constrained optimization problem with a quadratic ob ..."
Abstract

Cited by 505 (14 self)
 Add to MetaCart
(Show Context)
In this paper we describe the algorithmic implementation of multiclass kernelbased vector machines. Our starting point is a generalized notion of the margin to multiclass problems. Using this notion we cast multiclass categorization problems as a constrained optimization problem with a quadratic objective function. Unlike most of previous approaches which typically decompose a multiclass problem into multiple independent binary classification tasks, our notion of margin yields a direct method for training multiclass predictors. By using the dual of the optimization problem we are able to incorporate kernels with a compact set of constraints and decompose the dual problem into multiple optimization problems of reduced size. We describe an efficient fixedpoint algorithm for solving the reduced optimization problems and prove its convergence. We then discuss technical details that yield significant running time improvements for large datasets. Finally, we describe various experiments with our approach comparing it to previously studied kernelbased methods. Our experiments indicate that for multiclass problems we attain stateoftheart accuracy.
Online passiveaggressive algorithms
 JMLR
, 2006
"... We present a unified view for online classification, regression, and uniclass problems. This view leads to a single algorithmic framework for the three problems. We prove worst case loss bounds for various algorithms for both the realizable case and the nonrealizable case. The end result is new alg ..."
Abstract

Cited by 397 (24 self)
 Add to MetaCart
(Show Context)
We present a unified view for online classification, regression, and uniclass problems. This view leads to a single algorithmic framework for the three problems. We prove worst case loss bounds for various algorithms for both the realizable case and the nonrealizable case. The end result is new algorithms and accompanying loss bounds for hingeloss regression and uniclass. We also get refined loss bounds for previously studied classification algorithms.
Nonprojective dependency parsing using spanning tree algorithms
 In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing
, 2005
"... We formalize weighted dependency parsing as searching for maximum spanning trees (MSTs) in directed graphs. Using this representation, the parsing algorithm of Eisner (1996) is sufficient for searching over all projective trees in O(n 3) time. More surprisingly, the representation is extended natura ..."
Abstract

Cited by 362 (10 self)
 Add to MetaCart
(Show Context)
We formalize weighted dependency parsing as searching for maximum spanning trees (MSTs) in directed graphs. Using this representation, the parsing algorithm of Eisner (1996) is sufficient for searching over all projective trees in O(n 3) time. More surprisingly, the representation is extended naturally to nonprojective parsing using ChuLiuEdmonds (Chu and Liu, 1965; Edmonds, 1967) MST algorithm, yielding an O(n 2) parsing algorithm. We evaluate these methods on the Prague Dependency Treebank using online largemargin learning techniques (Crammer et al., 2003; McDonald et al., 2005) and show that MST parsing increases efficiency and accuracy for languages with nonprojective dependencies. 1
Online largemargin training of dependency parsers
 In Proc. ACL
, 2005
"... We present an effective training algorithm for linearlyscored dependency parsers that implements online largemargin multiclass training (Crammer and Singer, 2003; Crammer et al., 2003) on top of efficient parsing techniques for dependency trees (Eisner, 1996). The trained parsers achieve a competi ..."
Abstract

Cited by 286 (22 self)
 Add to MetaCart
(Show Context)
We present an effective training algorithm for linearlyscored dependency parsers that implements online largemargin multiclass training (Crammer and Singer, 2003; Crammer et al., 2003) on top of efficient parsing techniques for dependency trees (Eisner, 1996). The trained parsers achieve a competitive dependency accuracy for both English and Czech with no language specific enhancements. 1
Online Learning of Approximate Dependency Parsing Algorithms
 In Proc. of EACL
, 2006
"... In this paper we extend the maximum spanning tree (MST) dependency parsing framework of McDonald et al. (2005c) to incorporate higherorder feature representations and allow dependency structures with multiple parents per word. We show that those extensions can make the MST framework computationally ..."
Abstract

Cited by 209 (11 self)
 Add to MetaCart
(Show Context)
In this paper we extend the maximum spanning tree (MST) dependency parsing framework of McDonald et al. (2005c) to incorporate higherorder feature representations and allow dependency structures with multiple parents per word. We show that those extensions can make the MST framework computationally intractable, but that the intractability can be circumvented with new approximate parsing algorithms. We conclude with experiments showing that discriminative online learning using those approximate algorithms achieves the best reported parsing accuracy for Czech and Danish. 1
Pranking with Ranking
 Advances in Neural Information Processing Systems 14
, 2001
"... We discuss the problem of ranking instances. In our framework each instance is associated with a rank or a rating, which is an integer from 1 to k. Our goal is to find a rankprediction rule that assigns each instance a rank which is as close as possible to the instance's true rank. We describe ..."
Abstract

Cited by 206 (5 self)
 Add to MetaCart
(Show Context)
We discuss the problem of ranking instances. In our framework each instance is associated with a rank or a rating, which is an integer from 1 to k. Our goal is to find a rankprediction rule that assigns each instance a rank which is as close as possible to the instance's true rank. We describe a simple and efficient online algorithm, analyze its performance in the mistake bound model, and prove its correctness. We describe two sets of experiments, with synthetic data and with the EachMovie dataset for collaborative filtering. In the experiments we performed, our algorithm outperforms online algorithms for regression and classification applied to ranking.
A Dual Coordinate Descent Method for Largescale Linear SVM
"... In many applications, data appear with a huge number of instances as well as features. Linear Support Vector Machines (SVM) is one of the most popular tools to deal with such largescale sparse data. This paper presents a novel dual coordinate descent method for linear SVM with L1 and L2loss functi ..."
Abstract

Cited by 188 (18 self)
 Add to MetaCart
(Show Context)
In many applications, data appear with a huge number of instances as well as features. Linear Support Vector Machines (SVM) is one of the most popular tools to deal with such largescale sparse data. This paper presents a novel dual coordinate descent method for linear SVM with L1 and L2loss functions. The proposed method is simple and reaches an ɛaccurate solution in O(log(1/ɛ)) iterations. Experiments indicate that our method is much faster than state of the art solvers such as Pegasos, TRON, SVM perf, and a recent primal coordinate descent implementation. 1.
Simple semisupervised dependency parsing
 In Proc. ACL/HLT
, 2008
"... We present a simple and effective semisupervised method for training dependency parsers. We focus on the problem of lexical representation, introducing features that incorporate word clusters derived from a large unannotated corpus. We demonstrate the effectiveness of the approach in a series of dep ..."
Abstract

Cited by 165 (8 self)
 Add to MetaCart
(Show Context)
We present a simple and effective semisupervised method for training dependency parsers. We focus on the problem of lexical representation, introducing features that incorporate word clusters derived from a large unannotated corpus. We demonstrate the effectiveness of the approach in a series of dependency parsing experiments on the Penn Treebank and Prague Dependency Treebank, and we show that the clusterbased features yield substantial gains in performance across a wide range of conditions. For example, in the case of English unlabeled secondorder parsing, we improve from a baseline accuracy of 92.02 % to 93.16%, and in the case of Czech unlabeled secondorder parsing, we improve from a baseline accuracy of 86.13% to 87.13%. In addition, we demonstrate that our method also improves performance when small amounts of training data are available, and can roughly halve the amount of supervised data required to reach a desired level of performance. 1
Fast Kernel Classifiers With Online And Active Learning
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... Very high dimensional learning systems become theoretically possible when training examples are abundant. The computing cost then becomes the limiting factor. Any efficient learning algorithm should at least take a brief look at each example. But should all examples be given equal attention? This ..."
Abstract

Cited by 145 (18 self)
 Add to MetaCart
Very high dimensional learning systems become theoretically possible when training examples are abundant. The computing cost then becomes the limiting factor. Any efficient learning algorithm should at least take a brief look at each example. But should all examples be given equal attention? This contribution proposes an empirical answer. We first present an online SVM algorithm based on this premise. LASVM yields competitive misclassification rates after a single pass over the training examples, outspeeding stateoftheart SVM solvers. Then we show how active example selection can yield faster training, higher accuracies, and simpler models, using only a fraction of the training example labels.