• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Sparseness of support vector machines (1071)

by I Steinwart
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 66
Next 10 →

Training a support vector machine in the primal

by Olivier Chapelle - Neural Computation , 2007
"... Most literature on Support Vector Machines (SVMs) concentrate on the dual optimization problem. In this paper, we would like to point out that the primal problem can also be solved efficiently, both for linear and non-linear SVMs, and that there is no reason for ignoring this possibilty. On the cont ..."
Abstract - Cited by 47 (5 self) - Add to MetaCart
Most literature on Support Vector Machines (SVMs) concentrate on the dual optimization problem. In this paper, we would like to point out that the primal problem can also be solved efficiently, both for linear and non-linear SVMs, and that there is no reason for ignoring this possibilty. On the contrary, from the primal point of view new families of algorithms for large scale SVM training can be investigated.

Trading convexity for scalability

by Ronan Collobert, Fabian Sinz, Jason Weston, Léon Bottou - ICML06, 23rd International Conference on Machine Learning , 2006
"... Convex learning algorithms, such as Support Vector Machines (SVMs), are often seen as highly desirable because they offer strong practical properties and are amenable to theoretical analysis. However, in this work we show how non-convexity can provide scalability advantages over convexity. We show h ..."
Abstract - Cited by 33 (2 self) - Add to MetaCart
Convex learning algorithms, such as Support Vector Machines (SVMs), are often seen as highly desirable because they offer strong practical properties and are amenable to theoretical analysis. However, in this work we show how non-convexity can provide scalability advantages over convexity. We show how concave-convex programming can be applied to produce (i) faster SVMs where training errors are no longer support vectors, and (ii) much faster Transductive SVMs. 1.

Fast rates for support vector machines using gaussian kernels

by Ingo Steinwart, Clint Scovel - Ann. Statist , 2004
"... We establish learning rates up to the order of n −1 for support vector machines with hinge loss (L1-SVMs) and nontrivial distributions. For the stochastic analysis of these algorithms we use recently developed concepts such as Tsybakov’s noise assumption and local Rademacher averages. Furthermore we ..."
Abstract - Cited by 31 (7 self) - Add to MetaCart
We establish learning rates up to the order of n −1 for support vector machines with hinge loss (L1-SVMs) and nontrivial distributions. For the stochastic analysis of these algorithms we use recently developed concepts such as Tsybakov’s noise assumption and local Rademacher averages. Furthermore we introduce a new geometric noise condition for distributions that is used to bound the approximation error of Gaussian kernels in terms of their widths. 1

Statistical analysis of some multi-category large margin classification methods

by Tong Zhang, Bernhard Schölkopf - Journal of Machine Learning Research , 2004
"... The purpose of this paper is to investigate statistical properties of risk minimization based multicategory classification methods. These methods can be considered as natural extensions of binary large margin classification. We establish conditions that guarantee the consistency of classifiers obtai ..."
Abstract - Cited by 29 (1 self) - Add to MetaCart
The purpose of this paper is to investigate statistical properties of risk minimization based multicategory classification methods. These methods can be considered as natural extensions of binary large margin classification. We establish conditions that guarantee the consistency of classifiers obtained in the risk minimization framework with respect to the classification error. Examples are provided for four specific forms of the general formulation, which extend a number of known methods. Using these examples, we show that some risk minimization formulations can also be used to obtain conditional probability estimates for the underlying problem. Such conditional probability information can be useful for statistical inferencing tasks beyond classification. 1.

Fast Rates for Regularized Least-squares Algorithm

by Andrea Caponnetto, Ernesto De Vito - Foundations of Computational Mathematics , 2005
"... We develop a theoretical analysis of generalization performances of regularized leastsquares on reproducing kernel Hilbert spaces for supervised learning. We show that the concept of effective dimension of an integral operator plays a central role in the definition of a criterion for the choice of t ..."
Abstract - Cited by 26 (6 self) - Add to MetaCart
We develop a theoretical analysis of generalization performances of regularized leastsquares on reproducing kernel Hilbert spaces for supervised learning. We show that the concept of effective dimension of an integral operator plays a central role in the definition of a criterion for the choice of the regularization parameter as a function of the number of samples. In fact a minimax analysis is performed which shows asymptotic optimality of the above mentioned criterion.

On robust properties of convex risk minimization methods for pattern recognition

by Andreas Christmann, Ingo Steinwart - Journal of Machine Learning Research , 2004
"... The paper brings together methods from two disciplines: machine learning theory and robust statistics. We argue that robustness is an important aspect and we show that many existing machine learning methods based on the convex risk minimization principle have − besides other good properties − also t ..."
Abstract - Cited by 19 (8 self) - Add to MetaCart
The paper brings together methods from two disciplines: machine learning theory and robust statistics. We argue that robustness is an important aspect and we show that many existing machine learning methods based on the convex risk minimization principle have − besides other good properties − also the advantage of being robust. Robustness properties of machine learning methods based on convex risk minimization are investigated for the problem of pattern recognition. Assumptions are given for the existence of the influence function of the classifiers and for bounds of the influence function. Kernel logistic regression, support vector machines, least squares and the AdaBoost loss function are treated as special cases. Some results on the robustness of such methods are also obtained for the sensitivity curve and the maxbias, which are two other robustness criteria. A sensitivity analysis of the support vector machine is given.

Sparseness vs estimating conditional probabilities: Some asymptotic results

by Peter L. Bartlett, Ambuj Tewari, Gábor Lugosi - Proceedings of the 17th Annual Conference On Learning Theory , 2004
"... One of the nice properties of kernel classifiers such as SVMs is that they often produce sparse solutions. However, the decision functions of these classifiers cannot always be used to estimate the conditional probability of the class label. We investigate the relationship between these two properti ..."
Abstract - Cited by 18 (0 self) - Add to MetaCart
One of the nice properties of kernel classifiers such as SVMs is that they often produce sparse solutions. However, the decision functions of these classifiers cannot always be used to estimate the conditional probability of the class label. We investigate the relationship between these two properties and show that these are intimately related: sparseness does not occur when the conditional probabilities can be unambiguously estimated. We consider a family of convex loss functions and derive sharp asymptotic results for the fraction of data that becomes support vectors. This enables us to characterize the exact trade-off between sparseness and the ability to estimate conditional probabilities for these loss functions.

A Direct Method for Building Sparse Kernel Learning Algorithms

by Mingrui Wu, Bernhard Schölkopf, Gökhan Bakır - JOURNAL OF MACHINE LEARNING RESEARCH , 2006
"... Many kernel learning algorithms, including support vector machines, result in a kernel machine, such as a kernel classifier, whose key component is a weight vector in a feature space implicitly introduced by a positive definite kernel function. This weight vector is usually obtained by solving a ..."
Abstract - Cited by 16 (0 self) - Add to MetaCart
Many kernel learning algorithms, including support vector machines, result in a kernel machine, such as a kernel classifier, whose key component is a weight vector in a feature space implicitly introduced by a positive definite kernel function. This weight vector is usually obtained by solving a convex optimization problem. Based on this fact we present a direct method to build sparse kernel learning algorithms by adding one more constraint to the original convex optimization problem, such that the sparseness of the resulting kernel machine is explicitly controlled while at the same time performance is kept as high as possible. A gradient based approach is provided to solve this modified optimization problem. Applying

Some properties of regularized kernel methods

by Ernesto De Vito, Lorenzo Rosasco, Andrea Caponnetto, Michele Piana, Alessandro Verri - JOURNAL OF MACHINE LEARNING RESEARCH , 2004
"... In regularized kernel methods, the solution of a learning problem is found by minimizing functionals consisting of the sum of a data and a complexity term. In this paper we investigate some properties of a more general form of the above functionals in which the data term corresponds to the expected ..."
Abstract - Cited by 13 (2 self) - Add to MetaCart
In regularized kernel methods, the solution of a learning problem is found by minimizing functionals consisting of the sum of a data and a complexity term. In this paper we investigate some properties of a more general form of the above functionals in which the data term corresponds to the expected risk. First, we prove a quantitative version of the representer theorem holding for both regression and classification, for both differentiable and non-differentiable loss functions, and for arbitrary offset terms. Second, we show that the case in which the offset space is non trivial corresponds to solving a standard problem of regularization in a Reproducing Kernel Hilbert Space in which the penalty term is given by a seminorm. Finally, we discuss the issues of existence and uniqueness of the solution. From the specialization of our analysis to the discrete setting it is immediate to establish a connection between the solution properties of sparsity and coefficient boundedness and some properties of the loss function. For the case of Support Vector Machines for classification, we also obtain a complete characterization of the whole method in terms of the Khun-Tucker conditions with no need to introduce the dual formulation.

A tutorial on ν-Support Vector Machines

by Pai-hsuen Chen, Chih-jen Lin, Bernhard Schölkopf - APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY , 2005
"... We briefly describe the main ideas of statistical learning theory, support vector machines (SVMs), and kernel feature spaces. We place particular emphasis on a description of the so-called n-SVM, including details of the algorithm and its implementation, theoretical results, and practical applicatio ..."
Abstract - Cited by 11 (0 self) - Add to MetaCart
We briefly describe the main ideas of statistical learning theory, support vector machines (SVMs), and kernel feature spaces. We place particular emphasis on a description of the so-called n-SVM, including details of the algorithm and its implementation, theoretical results, and practical applications.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University