• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Efficient svm training using lowrank kernel representation (0)

by S Fine, K Scheinberg
Venue:JMLR
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 106
Next 10 →

Using the Nyström Method to Speed Up Kernel Machines

by Christopher Williams, Matthias Seeger - Advances in Neural Information Processing Systems 13 , 2001
"... A major problem for kernel-based predictors (such as Support Vector Machines and Gaussian processes) is that the amount of computation required to find the solution scales as O(n ), where n is the number of training examples. We show that an approximation to the eigendecomposition of the Gram matrix ..."
Abstract - Cited by 207 (6 self) - Add to MetaCart
A major problem for kernel-based predictors (such as Support Vector Machines and Gaussian processes) is that the amount of computation required to find the solution scales as O(n ), where n is the number of training examples. We show that an approximation to the eigendecomposition of the Gram matrix can be computed by the Nyström method (which is used for the numerical solution of eigenproblems). This is achieved by carrying out an eigendecomposition on a smaller system of size m < n, and then expanding the results back up to n dimensions. The computational complexity of a predictor using this approximation is O(m n). We report experiments on the USPS and abalone data sets and show that we can set m n without any significant decrease in the accuracy of the solution.

Pegasos: Primal Estimated sub-gradient solver for SVM

by Shai Shalev-Shwartz, Yoram Singer, Nathan Srebro, Andrew Cotter
"... We describe and analyze a simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy ɛ is Õ(1/ɛ), where each iteration operates on a singl ..."
Abstract - Cited by 131 (10 self) - Add to MetaCart
We describe and analyze a simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy ɛ is Õ(1/ɛ), where each iteration operates on a single training example. In contrast, previous analyses of stochastic gradient descent methods for SVMs require Ω(1/ɛ2) iterations. As in previously devised SVM solvers, the number of iterations also scales linearly with 1/λ, where λ is the regularization parameter of SVM. For a linear kernel, the total run-time of our method is Õ(d/(λɛ)), where d is a bound on the number of non-zero features in each example. Since the run-time does not depend directly on the size of the training set, the resulting algorithm is especially suited for learning from large datasets. Our approach also extends to non-linear kernels while working solely on the primal objective function, though in this case the runtime does depend linearly on the training set size. Our algorithm is particularly well suited for large text classification problems, where we demonstrate an order-of-magnitude speedup over previous SVM learning methods.

Sparse Greedy Gaussian Process Regression

by Alex J. Smola, Peter Bartlett - Advances in Neural Information Processing Systems 13 , 2001
"... We present a simple sparse greedy technique to approximate the maximum a posteriori estimate of Gaussian Processes with much improved scaling behaviour in the sample size m. In particular, computational requirements are O(n m), storage is O(nm), the cost for prediction is O(n) and the cost to comput ..."
Abstract - Cited by 93 (1 self) - Add to MetaCart
We present a simple sparse greedy technique to approximate the maximum a posteriori estimate of Gaussian Processes with much improved scaling behaviour in the sample size m. In particular, computational requirements are O(n m), storage is O(nm), the cost for prediction is O(n) and the cost to compute confidence bounds is O(nm), where n m. We show how to compute a stopping criterion, give bounds on the approximation error, and show applications to large scale problems.

Core vector machines: Fast SVM training on very large data sets

by Ivor W. Tsang, James T. Kwok, Pak-ming Cheung, Nello Cristianini - Journal of Machine Learning Research , 2005
"... Standard SVM training has O(m 3) time and O(m 2) space complexities, where m is the training set size. It is thus computationally infeasible on very large data sets. By observing that practical SVM implementations only approximate the optimal solution by an iterative strategy, we scale up kernel met ..."
Abstract - Cited by 61 (11 self) - Add to MetaCart
Standard SVM training has O(m 3) time and O(m 2) space complexities, where m is the training set size. It is thus computationally infeasible on very large data sets. By observing that practical SVM implementations only approximate the optimal solution by an iterative strategy, we scale up kernel methods by exploiting such “approximateness ” in this paper. We first show that many kernel methods can be equivalently formulated as minimum enclosing ball (MEB) problems in computational geometry. Then, by adopting an efficient approximate MEB algorithm, we obtain provably approximately optimal solutions with the idea of core sets. Our proposed Core Vector Machine (CVM) algorithm can be used with nonlinear kernels and has a time complexity that is linear in m and a space complexity that is independent of m. Experiments on large toy and realworld data sets demonstrate that the CVM is as accurate as existing SVM implementations, but is much faster and can handle much larger data sets than existing scale-up methods. For example, CVM with the Gaussian kernel produces superior results on the KDDCUP-99 intrusion detection data, which has about five million training patterns, in only 1.4 seconds on a 3.2GHz Pentium–4 PC.

On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning

by Petros Drineas, Michael W. Mahoney - JOURNAL OF MACHINE LEARNING RESEARCH , 2005
"... A problem for many kernel-based methods is that the amount of computation required to find the solution scales as O(n³), where n is the number of training examples. We develop and analyze an algorithm to compute an easily-interpretable low-rank approximation to an nn Gram matrix G such that compu ..."
Abstract - Cited by 56 (6 self) - Add to MetaCart
A problem for many kernel-based methods is that the amount of computation required to find the solution scales as O(n³), where n is the number of training examples. We develop and analyze an algorithm to compute an easily-interpretable low-rank approximation to an nn Gram matrix G such that computations of interest may be performed more rapidly. The approximation is of the form G k = CW , where C is a matrix consisting of a small number c of columns of G and W k is the best rank-k approximation to W , the matrix formed by the intersection between those c columns of G and the corresponding c rows of G. An important aspect of the algorithm is the probability distribution used to randomly sample the columns; we will use a judiciously-chosen and data-dependent nonuniform probability distribution. Let F denote the spectral norm and the Frobenius norm, respectively, of a matrix, and let G k be the best rank-k approximation to G. We prove that by choosing O(k/# ) columns both in expectation and with high probability, for both # = 2, F , and for all k : 0 rank(W ). This approximation can be computed using O(n) additional space and time, after making two passes over the data from external storage. The relationships between this algorithm, other related matrix decompositions, and the Nyström method from integral equation theory are discussed.

Training a support vector machine in the primal

by Olivier Chapelle - Neural Computation , 2007
"... Most literature on Support Vector Machines (SVMs) concentrate on the dual optimization problem. In this paper, we would like to point out that the primal problem can also be solved efficiently, both for linear and non-linear SVMs, and that there is no reason for ignoring this possibilty. On the cont ..."
Abstract - Cited by 47 (5 self) - Add to MetaCart
Most literature on Support Vector Machines (SVMs) concentrate on the dual optimization problem. In this paper, we would like to point out that the primal problem can also be solved efficiently, both for linear and non-linear SVMs, and that there is no reason for ignoring this possibilty. On the contrary, from the primal point of view new families of algorithms for large scale SVM training can be investigated.

The Kernel Recursive Least Squares Algorithm

by Yaakov Engel, Shie Mannor, Ron Meir - IEEE Transactions on Signal Processing , 2003
"... We present a non-linear kernel-based version of the Recursive Least Squares (RLS) algorithm. Our Kernel-RLS (KRLS) algorithm performs linear regression in the feature space induced by a Mercer kernel, and can therefore be used to recursively construct the minimum mean squared -error regressor. Spars ..."
Abstract - Cited by 37 (2 self) - Add to MetaCart
We present a non-linear kernel-based version of the Recursive Least Squares (RLS) algorithm. Our Kernel-RLS (KRLS) algorithm performs linear regression in the feature space induced by a Mercer kernel, and can therefore be used to recursively construct the minimum mean squared -error regressor. Sparsity of the solution is achieved by a sequential sparsification process that admits into the kernel representation a new input sample only if its feature space image cannot be suffciently well approximated by combining the images of previously admitted samples. This sparsification procedure is crucial to the operation of KRLS, as it allows it to operate on-line, and by effectively regularizing its solutions. A theoretical analysis of the sparsification method reveals its close affinity to kernel PCA, and a data-dependent loss bound is presented, quantifying the generalization performance of the KRLS algorithm. We demonstrate the performance and scaling properties of KRLS and compare it to a stateof -the-art Support Vector Regression algorithm, using both synthetic and real data. We additionally test KRLS on two signal processing problems in which the use of traditional least-squares methods is commonplace: Time series prediction and channel equalization.

A scalable modular convex solver for regularized risk minimization

by Choon Hui Teo, Quoc Le, Alex Smola, S. V. N. Vishwanathan - In KDD. ACM , 2007
"... A wide variety of machine learning problems can be described as minimizing a regularized risk functional, with different algorithms using different notions of risk and different regularizers. Examples include linear Support Vector Machines (SVMs), Logistic Regression, Conditional Random Fields (CRFs ..."
Abstract - Cited by 37 (13 self) - Add to MetaCart
A wide variety of machine learning problems can be described as minimizing a regularized risk functional, with different algorithms using different notions of risk and different regularizers. Examples include linear Support Vector Machines (SVMs), Logistic Regression, Conditional Random Fields (CRFs), and Lasso amongst others. This paper describes the theory and implementation of a highly scalable and modular convex solver which solves all these estimation problems. It can be parallelized on a cluster of workstations, allows for data-locality, and can deal with regularizers such as ℓ1 and ℓ2 penalties. At present, our solver implements 20 different estimation problems, can be easily extended, scales to millions of observations, and is up to 10 times faster than specialized solvers for many applications. The open source code is freely available as part of the ELEFANT toolbox.

Kernel Methods for Missing Variables

by Alex Smola Vishwanathan, Alex J. Smola, S. V. N. Vishwanathan - In Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics , 2005
"... We present methods for dealing with missing variables in the context of Gaussian Processes and Support Vector Machines. This solves an important problem which has largely been ignored by kernel methods: How to systematically deal with incomplete data? Our method can also be applied to problems ..."
Abstract - Cited by 35 (3 self) - Add to MetaCart
We present methods for dealing with missing variables in the context of Gaussian Processes and Support Vector Machines. This solves an important problem which has largely been ignored by kernel methods: How to systematically deal with incomplete data? Our method can also be applied to problems with partially observed labels as well as to the transductive setting where we view the labels as missing data.

Efficient Kernel Machines Using the Improved Fast Gauss Transform

by Changjiang Yang, Ramani Duraiswami, Larry Davis - Advances in Neural Information Processing Systems 17 , 2004
"... The computation required for kernel machines with N training samples is O(N ). Such computational complexity is significant even for moderate size problems and is prohibitive for large datasets. We present an approximation technique based on the improved fast Gauss transform to reduce the com ..."
Abstract - Cited by 33 (5 self) - Add to MetaCart
The computation required for kernel machines with N training samples is O(N ). Such computational complexity is significant even for moderate size problems and is prohibitive for large datasets. We present an approximation technique based on the improved fast Gauss transform to reduce the computation to O(N). We also give an error bound for the approximation, and provide experimental results on the UCI datasets.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University