Results 1 - 10
of
145
Online Learning with Kernels
, 2003
"... Kernel based algorithms such as support vector machines have achieved considerable success in various problems in the batch setting where all of the training data is available in advance. Support vector machines combine the so-called kernel trick with the large margin idea. There has been little u ..."
Abstract
-
Cited by 1512 (112 self)
- Add to MetaCart
Kernel based algorithms such as support vector machines have achieved considerable success in various problems in the batch setting where all of the training data is available in advance. Support vector machines combine the so-called kernel trick with the large margin idea. There has been little use of these methods in an online setting suitable for real-time applications. In this paper we consider online learning in a Reproducing Kernel Hilbert Space. By considering classical stochastic gradient descent within a feature space, and the use of some straightforward tricks, we develop simple and computationally efficient algorithms for a wide range of problems such as classification, regression, and novelty detection. In addition to allowing the exploitation of the kernel trick in an online setting, we examine the value of large margins for classification in the online setting with a drifting target. We derive worst case loss bounds and moreover we show the convergence of the hypothesis to the minimiser of the regularised risk functional. We present some experimental results that support the theory as well as illustrating the power of the new algorithms for online novelty detection. In addition
Recognizing Imprecisely Localized, Partially Occluded and Expression Variant Faces from a Single Sample per Class
, 2002
"... The classical way of attempting to solve the face (or object) recognition problem is by using large and representative datasets. In many applications though, only one sample per class is available to the system. In this contribution, we describe a probabilistic approach that is able to compensate fo ..."
Abstract
-
Cited by 110 (6 self)
- Add to MetaCart
The classical way of attempting to solve the face (or object) recognition problem is by using large and representative datasets. In many applications though, only one sample per class is available to the system. In this contribution, we describe a probabilistic approach that is able to compensate for imprecisely localized, partially occluded and expression variant faces even when only one single training sample per class is available to the system. To solve the localization problem, we find the subspace (within the feature space, e.g. eigenspace) that represents this error for each of the training images. To resolve the occlusion problem, each face is divided into k local regions which are analyzed in isolation. In contrast with other approaches, where a simple voting space is used, we present a probabilistic method that analyzes how "good" a local match is. To make the recognition system less sensitive to the differences between the facial expression displayed on the training and the testing images, we weight the results obtained on each local area on the bases of how much of this local area is affected by the expression displayed on the current test image.
The Entire Regularization Path for the Support Vector Machine
, 2004
"... In this paper we argue that the choice of the SVM cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model. ..."
Abstract
-
Cited by 107 (8 self)
- Add to MetaCart
In this paper we argue that the choice of the SVM cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model.
Regularized multi-task learning
, 2004
"... This paper provides a foundation for multi–task learning using reproducing kernel Hilbert spaces of vector–valued functions. In this setting, the kernel is a matrix–valued function. Some explicit examples will be described which go beyond our earlier results in [7]. In particular, we characterize cl ..."
Abstract
-
Cited by 92 (1 self)
- Add to MetaCart
This paper provides a foundation for multi–task learning using reproducing kernel Hilbert spaces of vector–valued functions. In this setting, the kernel is a matrix–valued function. Some explicit examples will be described which go beyond our earlier results in [7]. In particular, we characterize classes of matrix– valued kernels which are linear and are of the dot product or the translation invariant type. We discuss how these kernels can be used to model relations between the tasks and present linear multi–task learning algorithms. Finally, we present a novel proof of the representer theorem for a minimizer of a regularization functional which is based on the notion of minimal norm interpolation. 1
Proximal support vector machine classifiers
- Proceedings KDD-2001: Knowledge Discovery and Data Mining
, 2001
"... Abstract—A new approach to support vector machine (SVM) classification is proposed wherein each of two data sets are proximal to one of two distinct planes that are not parallel to each other. Each plane is generated such that it is closest to one of the two data sets and as far as possible from the ..."
Abstract
-
Cited by 80 (11 self)
- Add to MetaCart
Abstract—A new approach to support vector machine (SVM) classification is proposed wherein each of two data sets are proximal to one of two distinct planes that are not parallel to each other. Each plane is generated such that it is closest to one of the two data sets and as far as possible from the other data set. Each of the two nonparallel proximal planes is obtained by a single MATLAB command as the eigenvector corresponding to a smallest eigenvalue of a generalized eigenvalue problem. Classification by proximity to two distinct nonlinear surfaces generated by a nonlinear kernel also leads to two simple generalized eigenvalue problems. The effectiveness of the proposed method is demonstrated by tests on simple examples as well as on a number of public data sets. These examples show the advantages of the proposed approach in both computation time and test set correctness. Index Terms—Support vector machines, proximal classification, generalized eigenvalues. 1
The mathematics of learning: Dealing with data
- Notices of the American Mathematical Society
, 2003
"... Draft for the Notices of the AMS Learning is key to developing systems tailored to a broad range of data analysis and information extraction tasks. We outline the mathematical foundations of learning theory and describe a key algorithm of it. 1 ..."
Abstract
-
Cited by 79 (11 self)
- Add to MetaCart
Draft for the Notices of the AMS Learning is key to developing systems tailored to a broad range of data analysis and information extraction tasks. We outline the mathematical foundations of learning theory and describe a key algorithm of it. 1
1-norm Support Vector Machines
- Neural Information Processing Systems
, 2003
"... The standard 2-norm SVM is known for its good performance in twoclass classification. In this paper, we consider the 1-norm SVM. We argue that the 1-norm SVM may have some advantage over the standard 2-norm SVM, especially when there are redundant noise features. We also propose an efficient alg ..."
Abstract
-
Cited by 74 (10 self)
- Add to MetaCart
The standard 2-norm SVM is known for its good performance in twoclass classification. In this paper, we consider the 1-norm SVM. We argue that the 1-norm SVM may have some advantage over the standard 2-norm SVM, especially when there are redundant noise features. We also propose an efficient algorithm that computes the whole solution path of the 1-norm SVM, hence facilitates adaptive selection of the tuning parameter for the 1-norm SVM.
Everything Old Is New Again: A Fresh Look at Historical Approaches
- in Machine Learning. PhD thesis, MIT
, 2002
"... 2 Everything Old Is New Again: A Fresh Look at Historical ..."
Abstract
-
Cited by 68 (5 self)
- Add to MetaCart
2 Everything Old Is New Again: A Fresh Look at Historical
Kernel Logistic Regression and the Import Vector Machine
- Journal of Computational and Graphical Statistics
, 2001
"... The support vector machine (SVM) is known for its good performance in binary classification, but its extension to multi-class classification is still an on-going research issue. In this paper, we propose a new approach for classification, called the import vector machine (IVM), which is built on ker ..."
Abstract
-
Cited by 64 (2 self)
- Add to MetaCart
The support vector machine (SVM) is known for its good performance in binary classification, but its extension to multi-class classification is still an on-going research issue. In this paper, we propose a new approach for classification, called the import vector machine (IVM), which is built on kernel logistic regression (KLR). We show that the IVM not only performs as well as the SVM in binary classification, but also can naturally be generalized to the multi-class case. Furthermore, the IVM provides an estimate of the underlying probability. Similar to the "support points" of the SVM, the IVM model uses only a fraction of the training data to index kernel basis functions, typically a much smaller fraction than the SVM. This gives the IVM a computational advantage over the SVM, especially when the size of the training data set is large. 1

