Results 1 - 10
of
272
Estimating the Support of a High-Dimensional Distribution
, 1999
"... Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We propose a metho ..."
Abstract
-
Cited by 381 (30 self)
- Add to MetaCart
Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We propose a method to approach this problem by trying to estimate a function f which is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The expansion coefficients are found by solving a quadratic programming problem, which we do by carrying out sequential optimization over pairs of input patterns. We also provide a preliminary theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabelled d...
On the algorithmic implementation of multiclass kernel-based vector machines
- Journal of Machine Learning Research
, 2001
"... In this paper we describe the algorithmic implementation of multiclass kernel-based vector machines. Our starting point is a generalized notion of the margin to multiclass problems. Using this notion we cast multiclass categorization problems as a constrained optimization problem with a quadratic ob ..."
Abstract
-
Cited by 239 (10 self)
- Add to MetaCart
In this paper we describe the algorithmic implementation of multiclass kernel-based vector machines. Our starting point is a generalized notion of the margin to multiclass problems. Using this notion we cast multiclass categorization problems as a constrained optimization problem with a quadratic objective function. Unlike most of previous approaches which typically decompose a multiclass problem into multiple independent binary classification tasks, our notion of margin yields a direct method for training multiclass predictors. By using the dual of the optimization problem we are able to incorporate kernels with a compact set of constraints and decompose the dual problem into multiple optimization problems of reduced size. We describe an efficient fixed-point algorithm for solving the reduced optimization problems and prove its convergence. We then discuss technical details that yield significant running time improvements for large datasets. Finally, we describe various experiments with our approach comparing it to previously studied kernel-based methods. Our experiments indicate that for multiclass problems we attain state-of-the-art accuracy.
SVMTorch: Support Vector Machines for Large-Scale Regression Problems
- Journal of Machine Learning Research
, 2001
"... Support Vector Machines (SVMs) for regression problems are trained by solving a quadratic optimization problem which needs on the order of l 2 memory and time resources to solve, where l is the number of training examples. In this paper, we propose a decomposition algorithm, SVMTorch 1 , whic ..."
Abstract
-
Cited by 190 (9 self)
- Add to MetaCart
Support Vector Machines (SVMs) for regression problems are trained by solving a quadratic optimization problem which needs on the order of l 2 memory and time resources to solve, where l is the number of training examples. In this paper, we propose a decomposition algorithm, SVMTorch 1 , which is similar to SVM-Light proposed by Joachims (1999) for classification problems, but adapted to regression problems. With this algorithm, one can now efficiently solve large-scale regression problems (more than 20000 examples). Comparisons with Nodelib, another publicly available SVM algorithm for large-scale regression problems from Flake and Lawrence (2000) yielded significant time improvements. Finally, based on a recent paper from Lin (2000), we show that a convergence proof exists for our algorithm. 1. Introduction Vapnik (1995) has proposed a method to solve regression problems using support vector machines. It has yielded excellent performance on many regression and time ser...
Multiple kernel learning, conic duality, and the SMO algorithm
- In Proceedings of the 21st International Conference on Machine Learning (ICML
, 2004
"... While classical kernel-based classifiers are based on a single kernel, in practice it is often desirable to base classifiers on combinations of multiple kernels. Lanckriet et al. (2004) considered conic combinations of kernel matrices for the support vector machine (SVM), and showed that the optimiz ..."
Abstract
-
Cited by 168 (19 self)
- Add to MetaCart
While classical kernel-based classifiers are based on a single kernel, in practice it is often desirable to base classifiers on combinations of multiple kernels. Lanckriet et al. (2004) considered conic combinations of kernel matrices for the support vector machine (SVM), and showed that the optimization of the coefficients of such a combination reduces to a convex optimization problem known as a quadratically-constrained quadratic program (QCQP). Unfortunately, current convex optimization toolboxes can solve this problem only for a small number of kernels and a small number of data points; moreover, the sequential minimal optimization (SMO) techniques that are essential in large-scale implementations of the SVM cannot be applied because the cost function is non-differentiable. We propose a novel dual formulation of the QCQP as a second-order cone programming problem, and show how to exploit the technique of Moreau-Yosida regularization to yield a formulation to which SMO techniques can be applied. We present experimental results that show that our SMO-based algorithm is significantly more efficient than the general-purpose interior point methods available in current optimization toolboxes. 1.
Improvements to Platt's SMO Algorithm for SVM Classifier Design
, 1999
"... This paper points out an important source of confusion and ineciency in Platt's Sequential Minimal Optimization (SMO) algorithm that is caused by the use of a single threshold value. Using clues from the KKT conditions for the dual problem, two threshold parameters are employed to derive modicati ..."
Abstract
-
Cited by 139 (10 self)
- Add to MetaCart
This paper points out an important source of confusion and ineciency in Platt's Sequential Minimal Optimization (SMO) algorithm that is caused by the use of a single threshold value. Using clues from the KKT conditions for the dual problem, two threshold parameters are employed to derive modications of SMO. These modied algorithms perform signicantly faster than the original SMO on all benchmark datasets tried. 1 Introduction In the past few years, there has been a lot of excitement and interest in Support Vector Machines[16, 2] because they have yielded excellent generalization performance on a wide range of problems. Recently, fast iterative algorithms that are also easy to implement have been suggested[9,4,7,3,6]. Platt's Sequential Minimization Algorithm (SMO)[9,11] is an important example. A remarkable feature of SMO is that it is also extremely easy to implement. Comparative testing against other algorithms, done by Platt, have shown that SMO is often much faster and has...
Efficient SVM training using low-rank kernel representations
- Journal of Machine Learning Research
, 2001
"... SVM training is a convex optimization problem which scales with the training set size rather than the feature space dimension. While this is usually considered to be a desired quality, in large scale problems it may cause training to be impractical. The common techniques to handle this difficulty ba ..."
Abstract
-
Cited by 133 (3 self)
- Add to MetaCart
SVM training is a convex optimization problem which scales with the training set size rather than the feature space dimension. While this is usually considered to be a desired quality, in large scale problems it may cause training to be impractical. The common techniques to handle this difficulty basically build a solution by solving a sequence of small scale subproblems. Our current effort is concentrated on the rank of the kernel matrix as a source for further enhancement of the training procedure. We first show that for a low rank kernel matrix it is possible to design a better interior point method (IPM) in terms of storage requirements as well as computational complexity. We then suggest an efficient use of a known factorization technique to approximate a given kernel matrix by a low rank matrix, which in turn will be used to feed the optimizer. Finally, we derive an upper bound on the change in the objective function value based on the approximation error and the number of active constraints (support vectors). This bound is general in the sense that it holds regardless of the approximation method.
Pegasos: Primal Estimated sub-gradient solver for SVM
"... We describe and analyze a simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy ɛ is Õ(1/ɛ), where each iteration operates on a singl ..."
Abstract
-
Cited by 131 (10 self)
- Add to MetaCart
We describe and analyze a simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy ɛ is Õ(1/ɛ), where each iteration operates on a single training example. In contrast, previous analyses of stochastic gradient descent methods for SVMs require Ω(1/ɛ2) iterations. As in previously devised SVM solvers, the number of iterations also scales linearly with 1/λ, where λ is the regularization parameter of SVM. For a linear kernel, the total run-time of our method is Õ(d/(λɛ)), where d is a bound on the number of non-zero features in each example. Since the run-time does not depend directly on the size of the training set, the resulting algorithm is especially suited for learning from large datasets. Our approach also extends to non-linear kernels while working solely on the primal objective function, though in this case the runtime does depend linearly on the training set size. Our algorithm is particularly well suited for large text classification problems, where we demonstrate an order-of-magnitude speedup over previous SVM learning methods.
Incremental and Decremental Support Vector Machine Learning
, 2000
"... An on-line recursive algorithm for training support vector machines, one vector at a time, is presented. Adiabatic increments retain the KuhnTucker conditions on all previously seen training data, in a number of steps each computed analytically. The incremental procedure is reversible, and decrement ..."
Abstract
-
Cited by 123 (3 self)
- Add to MetaCart
An on-line recursive algorithm for training support vector machines, one vector at a time, is presented. Adiabatic increments retain the KuhnTucker conditions on all previously seen training data, in a number of steps each computed analytically. The incremental procedure is reversible, and decremental "unlearning" offers an efficient method to exactly evaluate leave-one-out generalization performance. Interpretation of decremental unlearning in feature space sheds light on the relationship between generalization and geometry of the data. 1
Training Invariant Support Vector Machines
, 2002
"... Practical experience has shown that in order to obtain the best possible performance, prior knowledge about invariances of a classification problem at hand ought to be incorporated into the training procedure. We describe and review all known methods for doing so in support vector machines, provide ..."
Abstract
-
Cited by 108 (15 self)
- Add to MetaCart
Practical experience has shown that in order to obtain the best possible performance, prior knowledge about invariances of a classification problem at hand ought to be incorporated into the training procedure. We describe and review all known methods for doing so in support vector machines, provide experimental results, and discuss their respective merits. One of the significant new results reported in this work is our recent achievement of the lowest reported test error on the well-known MNIST digit recognition benchmark task, with SVM training times that are also significantly faster than previous SVM methods.
Torch: A Modular Machine Learning Software Library
, 2002
"... Many scientific communities have expressed a growing interest in machine learning algorithms recently, mainly due to the generally good results they provide, compared to traditional statistical or AI approaches. However, these machine learning algorithms are often complex to implement and to use pro ..."
Abstract
-
Cited by 92 (18 self)
- Add to MetaCart
Many scientific communities have expressed a growing interest in machine learning algorithms recently, mainly due to the generally good results they provide, compared to traditional statistical or AI approaches. However, these machine learning algorithms are often complex to implement and to use properly and efficiently. We thus present in this paper a new machine learning software library in which most state-of-the-art algorithms have already been implemented and are available in a unified framework, in order for scientists to be able to use them, compare them, and even extend them. More interestingly, this library is freely available under a BSD license and can be retrieved on the web by everyone.

