Results 1  10
of
110
Training a support vector machine in the primal
 Neural Computation
, 2007
"... Most literature on Support Vector Machines (SVMs) concentrate on the dual optimization problem. In this paper, we would like to point out that the primal problem can also be solved efficiently, both for linear and nonlinear SVMs, and that there is no reason for ignoring this possibilty. On the cont ..."
Abstract

Cited by 91 (5 self)
 Add to MetaCart
Most literature on Support Vector Machines (SVMs) concentrate on the dual optimization problem. In this paper, we would like to point out that the primal problem can also be solved efficiently, both for linear and nonlinear SVMs, and that there is no reason for ignoring this possibilty. On the contrary, from the primal point of view new families of algorithms for large scale SVM training can be investigated.
Fast rates for support vector machines using gaussian kernels
 Ann. Statist
, 2004
"... We establish learning rates up to the order of n −1 for support vector machines with hinge loss (L1SVMs) and nontrivial distributions. For the stochastic analysis of these algorithms we use recently developed concepts such as Tsybakov’s noise assumption and local Rademacher averages. Furthermore we ..."
Abstract

Cited by 53 (7 self)
 Add to MetaCart
We establish learning rates up to the order of n −1 for support vector machines with hinge loss (L1SVMs) and nontrivial distributions. For the stochastic analysis of these algorithms we use recently developed concepts such as Tsybakov’s noise assumption and local Rademacher averages. Furthermore we introduce a new geometric noise condition for distributions that is used to bound the approximation error of Gaussian kernels in terms of their widths. 1
Trading convexity for scalability
 ICML06, 23rd International Conference on Machine Learning
, 2006
"... Convex learning algorithms, such as Support Vector Machines (SVMs), are often seen as highly desirable because they offer strong practical properties and are amenable to theoretical analysis. However, in this work we show how nonconvexity can provide scalability advantages over convexity. We show h ..."
Abstract

Cited by 51 (3 self)
 Add to MetaCart
Convex learning algorithms, such as Support Vector Machines (SVMs), are often seen as highly desirable because they offer strong practical properties and are amenable to theoretical analysis. However, in this work we show how nonconvexity can provide scalability advantages over convexity. We show how concaveconvex programming can be applied to produce (i) faster SVMs where training errors are no longer support vectors, and (ii) much faster Transductive SVMs. 1.
Variable Kernel Density Estimation
 Annals of Statistics
, 1992
"... In this paper, we propose a method for robust kernel density estimation. We interpret a KDE with Gaussian kernel as the inner product between a mapped test point and the centroid of mapped training points in kernel feature space. Our robust KDE replaces the centroid with a robust estimate based on M ..."
Abstract

Cited by 50 (2 self)
 Add to MetaCart
In this paper, we propose a method for robust kernel density estimation. We interpret a KDE with Gaussian kernel as the inner product between a mapped test point and the centroid of mapped training points in kernel feature space. Our robust KDE replaces the centroid with a robust estimate based on Mestimation [1]. The iteratively reweighted least squares (IRWLS) algorithm for Mestimation depends only on inner products, and can therefore be implemented using the kernel trick. We prove the IRWLS method monotonically decreases its objective value at every iteration for a broad class of robust loss functions. Our proposed method is applied to synthetic data and network traffic volumes, and the results compare favorably to the standard KDE. Index Terms — kernel density estimation, Mestimator, outlier, kernel feature space, kernel trick 1.
Statistical analysis of some multicategory large margin classification methods
 Journal of Machine Learning Research
, 2004
"... The purpose of this paper is to investigate statistical properties of risk minimization based multicategory classification methods. These methods can be considered as natural extensions of binary large margin classification. We establish conditions that guarantee the consistency of classifiers obtai ..."
Abstract

Cited by 45 (2 self)
 Add to MetaCart
The purpose of this paper is to investigate statistical properties of risk minimization based multicategory classification methods. These methods can be considered as natural extensions of binary large margin classification. We establish conditions that guarantee the consistency of classifiers obtained in the risk minimization framework with respect to the classification error. Examples are provided for four specific forms of the general formulation, which extend a number of known methods. Using these examples, we show that some risk minimization formulations can also be used to obtain conditional probability estimates for the underlying problem. Such conditional probability information can be useful for statistical inferencing tasks beyond classification. 1.
Optimal rates for the regularized leastsquares algorithm
 Foundations of Computational Mathematics
"... We develop a theoretical analysis of generalization performances of regularized leastsquares on reproducing kernel Hilbert spaces for supervised learning. We show that the concept of effective dimension of an integral operator plays a central role in the definition of a criterion for the choice of t ..."
Abstract

Cited by 39 (8 self)
 Add to MetaCart
We develop a theoretical analysis of generalization performances of regularized leastsquares on reproducing kernel Hilbert spaces for supervised learning. We show that the concept of effective dimension of an integral operator plays a central role in the definition of a criterion for the choice of the regularization parameter as a function of the number of samples. In fact a minimax analysis is performed which shows asymptotic optimality of the above mentioned criterion.
Consistency and convergence rates of oneclass SVM and related algorithms
, 2006
"... We determine the asymptotic limit of the function computed by support vector machines (SVM) and related algorithms that minimize a regularized empirical convex loss function in the reproducing kernel Hilbert space of the Gaussian RBF kernel, in the situation where the number of examples tends to inf ..."
Abstract

Cited by 28 (3 self)
 Add to MetaCart
We determine the asymptotic limit of the function computed by support vector machines (SVM) and related algorithms that minimize a regularized empirical convex loss function in the reproducing kernel Hilbert space of the Gaussian RBF kernel, in the situation where the number of examples tends to infinity, the bandwidth of the Gaussian kernel tends to 0, and the regularization parameter is held fixed. Nonasymptotic convergence bounds to this limit in the L2 sense are provided, together with upper bounds on the classification error that is shown to converge to the Bayes risk, therefore proving the Bayesconsistency of a variety of methods although the regularization term does not vanish. These results are particularly relevant to the oneclass SVM, for which the regularization can not vanish by construction, and which is shown for the first time to be a consistent density level set estimator.
Learning on the border: Active learning in imbalanced data classification
 In Proc. ACM Conf. on Information and Knowledge Management (CIKM ’07
, 2007
"... This paper is concerned with the class imbalance problem which has been known to hinder the learning performance of classification algorithms. The problem occurs when there are significantly less number of observations of the target concept. Various realworld classification tasks, such as medical d ..."
Abstract

Cited by 27 (2 self)
 Add to MetaCart
This paper is concerned with the class imbalance problem which has been known to hinder the learning performance of classification algorithms. The problem occurs when there are significantly less number of observations of the target concept. Various realworld classification tasks, such as medical diagnosis, text categorization and fraud detection suffer from this phenomenon. The standard machine learning algorithms yield better prediction performance with balanced datasets. In this paper, we demonstrate that active learning is capable of solving the class imbalance problem by providing the learner more balanced classes. We also propose an efficient way of selecting informative instances from a smaller pool of samples for active learning which does not necessitate a search through the entire dataset. The proposed method yields an efficient querying system and allows active learning to be applied to very large datasets. Our experimental results show that with an early stopping criteria, active learning achieves a fast solution with competitive prediction performance in imbalanced data classification.
Sparseness vs estimating conditional probabilities: Some asymptotic results
 Proceedings of the 17th Annual Conference On Learning Theory
, 2004
"... One of the nice properties of kernel classifiers such as SVMs is that they often produce sparse solutions. However, the decision functions of these classifiers cannot always be used to estimate the conditional probability of the class label. We investigate the relationship between these two properti ..."
Abstract

Cited by 27 (0 self)
 Add to MetaCart
One of the nice properties of kernel classifiers such as SVMs is that they often produce sparse solutions. However, the decision functions of these classifiers cannot always be used to estimate the conditional probability of the class label. We investigate the relationship between these two properties and show that these are intimately related: sparseness does not occur when the conditional probabilities can be unambiguously estimated. We consider a family of convex loss functions and derive sharp asymptotic results for the fraction of data that becomes support vectors. This enables us to characterize the exact tradeoff between sparseness and the ability to estimate conditional probabilities for these loss functions.