Results 1 - 10
of
30
Classification using Intersection Kernel Support Vector Machines is Efficient ∗
"... Straightforward classification using kernelized SVMs requires evaluating the kernel for a test vector and each of the support vectors. For a class of kernels we show that one can do this much more efficiently. In particular we show that one can build histogram intersection kernel SVMs (IKSVMs) with ..."
Abstract
-
Cited by 54 (6 self)
- Add to MetaCart
Straightforward classification using kernelized SVMs requires evaluating the kernel for a test vector and each of the support vectors. For a class of kernels we show that one can do this much more efficiently. In particular we show that one can build histogram intersection kernel SVMs (IKSVMs) with runtime complexity of the classifier logarithmic in the number of support vectors as opposed to linear for the standard approach. We further show that by precomputing auxiliary tables we can construct an approximate classifier with constant runtime and space requirements, independent of the number of support vectors, with negligible loss in classification accuracy on various tasks. This approximation also applies to 1 − χ2 and other kernels of similar form. We also introduce novel features based on a multi-level histograms of oriented edge energy and present experiments on various detection datasets. On the INRIA pedestrian dataset an approximate IKSVM classifier based on these features has the current best performance, with a miss rate 13 % lower at 10−6 False Positive Per Window than the linear SVM detector of Dalal & Triggs. On the Daimler Chrysler pedestrian dataset IKSVM gives comparable accuracy to the best results (based on quadratic SVM), while being 15 × faster. In these experiments our approximate IKSVM is up to 2000 × faster than a standard implementation and requires 200 × less memory. Finally we show that a 50 × speedup is possible using approximate IKSVM based on spatial pyramid features on the Caltech 101 dataset with negligible loss of accuracy. 1.
Training a support vector machine in the primal
- Neural Computation
, 2007
"... Most literature on Support Vector Machines (SVMs) concentrate on the dual optimization problem. In this paper, we would like to point out that the primal problem can also be solved efficiently, both for linear and non-linear SVMs, and that there is no reason for ignoring this possibilty. On the cont ..."
Abstract
-
Cited by 47 (5 self)
- Add to MetaCart
Most literature on Support Vector Machines (SVMs) concentrate on the dual optimization problem. In this paper, we would like to point out that the primal problem can also be solved efficiently, both for linear and non-linear SVMs, and that there is no reason for ignoring this possibilty. On the contrary, from the primal point of view new families of algorithms for large scale SVM training can be investigated.
The Interplay of Optimization and Machine Learning Research
- Journal of Machine Learning Research
, 2006
"... The fields of machine learning and mathematical programming are increasingly intertwined. Optimization problems lie at the heart of most machine learning approaches. The Special Topic on Machine Learning and Large Scale Optimization examines this interplay. Machine learning researchers have embra ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
The fields of machine learning and mathematical programming are increasingly intertwined. Optimization problems lie at the heart of most machine learning approaches. The Special Topic on Machine Learning and Large Scale Optimization examines this interplay. Machine learning researchers have embraced the advances in mathematical programming allowing new types of models to be pursued. The special topic includes models using quadratic, linear, second-order cone, semidefinite, and semi-infinite programs. We observe that the qualities of good optimization algorithms from the machine learning and optimization perspectives can be quite different. Mathematical programming puts a premium on accuracy, speed, and robustness. Since generalization is the bottom line in machine learning and training is normally done off-line, accuracy and small speed improvements are of little concern in machine learning. Machine learning prefers simpler algorithms that work in reasonable computational time for specific classes of problems. Reducing machine learning problems to well-explored mathematical programming classes with robust general purpose optimization codes allows machine learning researchers to rapidly develop new techniques.
Approximation Methods for Gaussian Process Regression
, 2007
"... A wealth of computationally efficient approximation methods for Gaussian process regression have been recently proposed. We give a unifying overview of sparse approximations, following Quiñonero-Candela and Rasmussen (2005), and a brief review of approximate matrix-vector multiplication methods. 1 ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
A wealth of computationally efficient approximation methods for Gaussian process regression have been recently proposed. We give a unifying overview of sparse approximations, following Quiñonero-Candela and Rasmussen (2005), and a brief review of approximate matrix-vector multiplication methods. 1
Training and Testing Low-degree Polynomial Data Mappings via Linear SVM
- JOURNAL OF MACHINE LEARNING RESEARCH
"... Kernel techniques have long been used in SVM to handle linearly inseparable problems by transforming data to a high dimensional space, but training and testing large data sets is often time consuming. In contrast, we can efficiently train and test much larger data sets using linear SVM without kerne ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Kernel techniques have long been used in SVM to handle linearly inseparable problems by transforming data to a high dimensional space, but training and testing large data sets is often time consuming. In contrast, we can efficiently train and test much larger data sets using linear SVM without kernels. In this work, we apply fast linear-SVM methods to the explicit form of polynomially mapped data and investigate implementation issues. The approach enjoys fast training and testing, but may sometimes achieve accuracy close to that of using highly nonlinear kernels. Empirical experiments show that the proposed method is useful for certain large-scale data sets. We successfully apply the proposed method to a natural language processing (NLP) application by improving the testing accuracy under some training/testing speed requirements.
Sparse Kernel SVMs via Cutting-Plane Training
"... Abstract. We explore an algorithm for training SVMs with Kernels that can represent the learned rule using arbitrary basis vectors, not just the support vectors (SVs) from the training set. This results in two benefits. First, the added flexibility makes it possible to find sparser solutions of good ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract. We explore an algorithm for training SVMs with Kernels that can represent the learned rule using arbitrary basis vectors, not just the support vectors (SVs) from the training set. This results in two benefits. First, the added flexibility makes it possible to find sparser solutions of good quality, substantially speeding-up prediction. Second, the improved sparsity can also make training of Kernel SVMs more efficient, especially for high-dimensional and sparse data (e.g. text classification). This has the potential to make training of Kernel SVMs tractable for large training sets, where conventional methods scale quadratically due to the linear growth of the number of SVs. In addition to a theoretical analysis of the algorithm, we also present an empirical evaluation. 1
Better Multiclass Classification via a Margin-Optimized Single Binary Problem
, 2008
"... We develop a new multiclass classification method that reduces the multiclass problem to a single binary classifier (SBC). Our method constructs the binary problem by embedding smaller binary problems into a single space. A good embedding will allow for large margin classification. We show that the ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We develop a new multiclass classification method that reduces the multiclass problem to a single binary classifier (SBC). Our method constructs the binary problem by embedding smaller binary problems into a single space. A good embedding will allow for large margin classification. We show that the construction of such an embedding can be reduced to the task of learning linear combinations of kernels. We provide a bound on the generalization error of the multiclass classifier obtained with our construction and outline the conditions for its consistency. Our empirical examination of the new method indicates that it outperforms one-vs-all, all-pairs and the error-correcting output coding scheme at least when the number of classes is small.
Building Sparse Multiple-Kernel SVM Classifiers
"... Abstract—The support vector machines (SVMs) have been very successful in many machine learning problems. However, they can be slow during testing because of the possibly large number of support vectors obtained. Recently, Wu et al. (2005) proposed a sparse formulation that restricts the SVM to use a ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract—The support vector machines (SVMs) have been very successful in many machine learning problems. However, they can be slow during testing because of the possibly large number of support vectors obtained. Recently, Wu et al. (2005) proposed a sparse formulation that restricts the SVM to use a small number of expansion vectors. In this paper, we further extend this idea by integrating with techniques from multiple-kernel learning (MKL). The kernel function in this sparse SVM formulation no longer needs to be fixed but can be automatically learned as a linear combination of kernels. Two formulations of such sparse multiple-kernel classifiers are proposed. The first one is based on a convex combination of the given base kernels, while the second one uses a convex combination of the so-called “equivalent ” kernels. Empirically, the second formulation is particularly competitive. Experiments on a large number of toy and real-world data sets show that the resultant classifier is compact and accurate, and can also be easily trained by simply alternating linear program and standard SVM solver. Index Terms—Gradient projection, kernel methods, multiple-kernel learning (MKL), sparsity, support vector machine (SVM).
COFFIN: A Computational Framework for Linear SVMs
, 2010
"... In a variety of applications, kernel machines such as Support Vector Machines (SVMs) have been used with great success often delivering state-of-the-art results. Using the kernel trick, they work on several domains and even enable heterogeneous data fusion by concatenating feature spaces or multiple ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In a variety of applications, kernel machines such as Support Vector Machines (SVMs) have been used with great success often delivering state-of-the-art results. Using the kernel trick, they work on several domains and even enable heterogeneous data fusion by concatenating feature spaces or multiple kernel learning. Unfortunately, they are not suited for truly large-scale applications since they suffer from the curse of supporting vectors, i.e., the speed of applying SVMs decays linearly with the number of support vectors. In this paper we develop COFFIN — a new training strategy for linear SVMs that effectively allows the use of on demand computed kernel feature spaces and virtual examples in the primal. With linear training and prediction effort this framework leverages SVM applications to truly large-scale problems: As an example, we train SVMs for human splice site recognition involving 50 million examples and sophisticated string kernels. Additionally, we learn an SVM based gender detector on 5 million examples on low-tech hardware and achieve beyond the stateof-the-art accuracies on both tasks. Source code, data sets and scripts are freely available from
Exploiting separability in large-scale linear Support Vector Machine training
, 2009
"... Linear support vector machine training can be represented as a large quadratic program. We present an efficient and numerically stable algorithm for this problem using interior point methods, which requires only O(n) operations per iteration. Through exploiting the separability of the Hessian, we p ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Linear support vector machine training can be represented as a large quadratic program. We present an efficient and numerically stable algorithm for this problem using interior point methods, which requires only O(n) operations per iteration. Through exploiting the separability of the Hessian, we provide a unified approach, from an optimization perspective, to 1-norm classification, 2-norm classification, universum classification, ordinal regression and ɛ-insensitive regression. Our approach has the added advantage of obtaining the hyperplane weights and bias directly from the solver. Numerical experiments indicate that, in contrast to existing methods, the algorithm is largely unaffected by noisy data, and they show training times for our implementation are consistent and highly competitive. We discuss the effect of using multiple correctors, and monitoring the angle of the normal to the hyperplane to determine termination.

