Results 1  10
of
389
Estimating the Support of a HighDimensional Distribution
, 1999
"... Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We propose a metho ..."
Abstract

Cited by 501 (32 self)
 Add to MetaCart
Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We propose a method to approach this problem by trying to estimate a function f which is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The expansion coefficients are found by solving a quadratic programming problem, which we do by carrying out sequential optimization over pairs of input patterns. We also provide a preliminary theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabelled d...
On the algorithmic implementation of multiclass kernelbased vector machines
 Journal of Machine Learning Research
"... In this paper we describe the algorithmic implementation of multiclass kernelbased vector machines. Our starting point is a generalized notion of the margin to multiclass problems. Using this notion we cast multiclass categorization problems as a constrained optimization problem with a quadratic ob ..."
Abstract

Cited by 363 (13 self)
 Add to MetaCart
In this paper we describe the algorithmic implementation of multiclass kernelbased vector machines. Our starting point is a generalized notion of the margin to multiclass problems. Using this notion we cast multiclass categorization problems as a constrained optimization problem with a quadratic objective function. Unlike most of previous approaches which typically decompose a multiclass problem into multiple independent binary classification tasks, our notion of margin yields a direct method for training multiclass predictors. By using the dual of the optimization problem we are able to incorporate kernels with a compact set of constraints and decompose the dual problem into multiple optimization problems of reduced size. We describe an efficient fixedpoint algorithm for solving the reduced optimization problems and prove its convergence. We then discuss technical details that yield significant running time improvements for large datasets. Finally, we describe various experiments with our approach comparing it to previously studied kernelbased methods. Our experiments indicate that for multiclass problems we attain stateoftheart accuracy.
Pegasos: Primal Estimated subgradient solver for SVM
"... We describe and analyze a simple and effective stochastic subgradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy ɛ is Õ(1/ɛ), where each iteration operates on a singl ..."
Abstract

Cited by 279 (15 self)
 Add to MetaCart
We describe and analyze a simple and effective stochastic subgradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy ɛ is Õ(1/ɛ), where each iteration operates on a single training example. In contrast, previous analyses of stochastic gradient descent methods for SVMs require Ω(1/ɛ2) iterations. As in previously devised SVM solvers, the number of iterations also scales linearly with 1/λ, where λ is the regularization parameter of SVM. For a linear kernel, the total runtime of our method is Õ(d/(λɛ)), where d is a bound on the number of nonzero features in each example. Since the runtime does not depend directly on the size of the training set, the resulting algorithm is especially suited for learning from large datasets. Our approach also extends to nonlinear kernels while working solely on the primal objective function, though in this case the runtime does depend linearly on the training set size. Our algorithm is particularly well suited for large text classification problems, where we demonstrate an orderofmagnitude speedup over previous SVM learning methods.
Multiple kernel learning, conic duality, and the SMO algorithm
 In Proceedings of the 21st International Conference on Machine Learning (ICML
, 2004
"... While classical kernelbased classifiers are based on a single kernel, in practice it is often desirable to base classifiers on combinations of multiple kernels. Lanckriet et al. (2004) considered conic combinations of kernel matrices for the support vector machine (SVM), and showed that the optimiz ..."
Abstract

Cited by 278 (32 self)
 Add to MetaCart
While classical kernelbased classifiers are based on a single kernel, in practice it is often desirable to base classifiers on combinations of multiple kernels. Lanckriet et al. (2004) considered conic combinations of kernel matrices for the support vector machine (SVM), and showed that the optimization of the coefficients of such a combination reduces to a convex optimization problem known as a quadraticallyconstrained quadratic program (QCQP). Unfortunately, current convex optimization toolboxes can solve this problem only for a small number of kernels and a small number of data points; moreover, the sequential minimal optimization (SMO) techniques that are essential in largescale implementations of the SVM cannot be applied because the cost function is nondifferentiable. We propose a novel dual formulation of the QCQP as a secondorder cone programming problem, and show how to exploit the technique of MoreauYosida regularization to yield a formulation to which SMO techniques can be applied. We present experimental results that show that our SMObased algorithm is significantly more efficient than the generalpurpose interior point methods available in current optimization toolboxes. 1.
SVMTorch: Support Vector Machines for LargeScale Regression Problems
 Journal of Machine Learning Research
, 2001
"... Support Vector Machines (SVMs) for regression problems are trained by solving a quadratic optimization problem which needs on the order of l 2 memory and time resources to solve, where l is the number of training examples. In this paper, we propose a decomposition algorithm, SVMTorch 1 , whic ..."
Abstract

Cited by 253 (10 self)
 Add to MetaCart
Support Vector Machines (SVMs) for regression problems are trained by solving a quadratic optimization problem which needs on the order of l 2 memory and time resources to solve, where l is the number of training examples. In this paper, we propose a decomposition algorithm, SVMTorch 1 , which is similar to SVMLight proposed by Joachims (1999) for classification problems, but adapted to regression problems. With this algorithm, one can now efficiently solve largescale regression problems (more than 20000 examples). Comparisons with Nodelib, another publicly available SVM algorithm for largescale regression problems from Flake and Lawrence (2000) yielded significant time improvements. Finally, based on a recent paper from Lin (2000), we show that a convergence proof exists for our algorithm. 1. Introduction Vapnik (1995) has proposed a method to solve regression problems using support vector machines. It has yielded excellent performance on many regression and time ser...
Efficient SVM training using lowrank kernel representations
 Journal of Machine Learning Research
, 2001
"... SVM training is a convex optimization problem which scales with the training set size rather than the feature space dimension. While this is usually considered to be a desired quality, in large scale problems it may cause training to be impractical. The common techniques to handle this difficulty ba ..."
Abstract

Cited by 188 (3 self)
 Add to MetaCart
SVM training is a convex optimization problem which scales with the training set size rather than the feature space dimension. While this is usually considered to be a desired quality, in large scale problems it may cause training to be impractical. The common techniques to handle this difficulty basically build a solution by solving a sequence of small scale subproblems. Our current effort is concentrated on the rank of the kernel matrix as a source for further enhancement of the training procedure. We first show that for a low rank kernel matrix it is possible to design a better interior point method (IPM) in terms of storage requirements as well as computational complexity. We then suggest an efficient use of a known factorization technique to approximate a given kernel matrix by a low rank matrix, which in turn will be used to feed the optimizer. Finally, we derive an upper bound on the change in the objective function value based on the approximation error and the number of active constraints (support vectors). This bound is general in the sense that it holds regardless of the approximation method.
Improvements to Platt's SMO Algorithm for SVM Classifier Design
, 1999
"... This paper points out an important source of confusion and ineciency in Platt's Sequential Minimal Optimization (SMO) algorithm that is caused by the use of a single threshold value. Using clues from the KKT conditions for the dual problem, two threshold parameters are employed to derive modicati ..."
Abstract

Cited by 187 (12 self)
 Add to MetaCart
This paper points out an important source of confusion and ineciency in Platt's Sequential Minimal Optimization (SMO) algorithm that is caused by the use of a single threshold value. Using clues from the KKT conditions for the dual problem, two threshold parameters are employed to derive modications of SMO. These modied algorithms perform signicantly faster than the original SMO on all benchmark datasets tried. 1 Introduction In the past few years, there has been a lot of excitement and interest in Support Vector Machines[16, 2] because they have yielded excellent generalization performance on a wide range of problems. Recently, fast iterative algorithms that are also easy to implement have been suggested[9,4,7,3,6]. Platt's Sequential Minimization Algorithm (SMO)[9,11] is an important example. A remarkable feature of SMO is that it is also extremely easy to implement. Comparative testing against other algorithms, done by Platt, have shown that SMO is often much faster and has...
Incremental and Decremental Support Vector Machine Learning
, 2000
"... An online recursive algorithm for training support vector machines, one vector at a time, is presented. Adiabatic increments retain the KuhnTucker conditions on all previously seen training data, in a number of steps each computed analytically. The incremental procedure is reversible, and decrement ..."
Abstract

Cited by 168 (4 self)
 Add to MetaCart
An online recursive algorithm for training support vector machines, one vector at a time, is presented. Adiabatic increments retain the KuhnTucker conditions on all previously seen training data, in a number of steps each computed analytically. The incremental procedure is reversible, and decremental "unlearning" offers an efficient method to exactly evaluate leaveoneout generalization performance. Interpretation of decremental unlearning in feature space sheds light on the relationship between generalization and geometry of the data. 1
Training Invariant Support Vector Machines
, 2002
"... Practical experience has shown that in order to obtain the best possible performance, prior knowledge about invariances of a classification problem at hand ought to be incorporated into the training procedure. We describe and review all known methods for doing so in support vector machines, provide ..."
Abstract

Cited by 136 (16 self)
 Add to MetaCart
Practical experience has shown that in order to obtain the best possible performance, prior knowledge about invariances of a classification problem at hand ought to be incorporated into the training procedure. We describe and review all known methods for doing so in support vector machines, provide experimental results, and discuss their respective merits. One of the significant new results reported in this work is our recent achievement of the lowest reported test error on the wellknown MNIST digit recognition benchmark task, with SVM training times that are also significantly faster than previous SVM methods.
Learning to rank: from pairwise approach to listwise approach
 In Proc. ICML’07
, 2007
"... The paper is concerned with learning to rank, which is to construct a model or a function for ranking objects. Learning to rank is useful for document retrieval, collaborative filtering, and many other applications. Several methods for learning to rank have been proposed, which take object pairs as ..."
Abstract

Cited by 132 (23 self)
 Add to MetaCart
The paper is concerned with learning to rank, which is to construct a model or a function for ranking objects. Learning to rank is useful for document retrieval, collaborative filtering, and many other applications. Several methods for learning to rank have been proposed, which take object pairs as ‘instances ’ in learning. We refer to them as the pairwise approach in this paper. Although the pairwise approach offers advantages, it ignores the fact that ranking is a prediction task on list of objects. The paper postulates that learning to rank should adopt the listwise approach in which lists of objects are used as ‘instances ’ in learning. The paper proposes a new probabilistic method for the approach. Specifically it introduces two probability models, respectively referred to as permutation probability and top one probability, to define a listwise loss function for learning. Neural Network and Gradient Descent are then employed as model and algorithm in the learning method. Experimental results on information retrieval show that the proposed listwise approach performs better than the pairwise approach. Microsoft technique report. A short version of this work is published