Results 1  10
of
24
Online Learning with Kernels
, 2003
"... Kernel based algorithms such as support vector machines have achieved considerable success in various problems in the batch setting where all of the training data is available in advance. Support vector machines combine the socalled kernel trick with the large margin idea. There has been little u ..."
Abstract

Cited by 2812 (126 self)
 Add to MetaCart
Kernel based algorithms such as support vector machines have achieved considerable success in various problems in the batch setting where all of the training data is available in advance. Support vector machines combine the socalled kernel trick with the large margin idea. There has been little use of these methods in an online setting suitable for realtime applications. In this paper we consider online learning in a Reproducing Kernel Hilbert Space. By considering classical stochastic gradient descent within a feature space, and the use of some straightforward tricks, we develop simple and computationally efficient algorithms for a wide range of problems such as classification, regression, and novelty detection. In addition to allowing the exploitation of the kernel trick in an online setting, we examine the value of large margins for classification in the online setting with a drifting target. We derive worst case loss bounds and moreover we show the convergence of the hypothesis to the minimiser of the regularised risk functional. We present some experimental results that support the theory as well as illustrating the power of the new algorithms for online novelty detection. In addition
A tutorial on support vector regression
, 2004
"... In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing ..."
Abstract

Cited by 828 (3 self)
 Add to MetaCart
In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.
Text Classification using String Kernels
"... We propose a novel approach for categorizing text documents based on the use of a special kernel. The kernel is an inner product in the feature space generated by all subsequences of length k. A subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguo ..."
Abstract

Cited by 494 (7 self)
 Add to MetaCart
(Show Context)
We propose a novel approach for categorizing text documents based on the use of a special kernel. The kernel is an inner product in the feature space generated by all subsequences of length k. A subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguously. The subsequences are weighted by anexponentially decaying factor of their full length in the text, hence emphasising those occurrences that are close to contiguous. A direct computation of this feature vector would involve a prohibitive amount of computation even for modest values of k, since the dimension of the feature space grows exponentially with k. The paper describes how despite this fact the inner product can be e ciently evaluated by a dynamic programming technique. Experimental comparisons of the performance of the kernel compared with a standard word feature space kernel Joachims (1998) show positive results on modestly sized datasets. The case of contiguous subsequences is also considered for comparison with the subsequences kernel with di erent decay factors. For larger documents and datasets the paper introduces an approximation technique that is shown to deliver good approximations e ciently for large datasets.
Input Space Versus Feature Space in KernelBased Methods
 IEEE TRANSACTIONS ON NEURAL NETWORKS
, 1999
"... This paper collects some ideas targeted at advancing our understanding of the feature spaces associated with support vector (SV) kernel functions. We first discuss the geometry of feature space. In particular, we review what is known about the shape of the image of input space under the feature spac ..."
Abstract

Cited by 132 (5 self)
 Add to MetaCart
(Show Context)
This paper collects some ideas targeted at advancing our understanding of the feature spaces associated with support vector (SV) kernel functions. We first discuss the geometry of feature space. In particular, we review what is known about the shape of the image of input space under the feature space map, and how this influences the capacity of SV methods. Following this, we describe how the metric governing the intrinsic geometry of the mapped surface can be computed in terms of the kernel, using the example of the class of inhomogeneous polynomial kernels, which are often used in SV pattern recognition. We then discuss the connection between feature space and input space by dealing with the question of how one can, given some vector in feature space, find a preimage (exact or approximate) in input space. We describe algorithms to tackle this issue, and show their utility in two applications of kernel methods. First, we use it to reduce the computational complexity of SV decision functions; second, we combine it with the Kernel PCA algorithm, thereby constructing a nonlinear statistical denoising technique which is shown to perform well on realworld data.
Online Bayes Point Machines
"... We present a new and simple algorithm for learning large margin classi ers that works in a truly online manner. The algorithm generates a linear classi er by averaging the weights associated with several perceptronlike algorithms run in parallel in order to approximate the Bayes point. A rand ..."
Abstract

Cited by 82 (3 self)
 Add to MetaCart
(Show Context)
We present a new and simple algorithm for learning large margin classi ers that works in a truly online manner. The algorithm generates a linear classi er by averaging the weights associated with several perceptronlike algorithms run in parallel in order to approximate the Bayes point. A random subsample of the incoming data stream is used to ensure diversity in the perceptron solutions. We experimentally study the algorithm's performance on online and batch learning settings.
Kernel methods for missing variables
 Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics
, 2005
"... We present methods for dealing with missing variables in the context of Gaussian Processes and Support Vector Machines. This solves an important problem which has largely been ignored by kernel methods: How to systematically deal with incomplete data? Our method can also be applied to problems wit ..."
Abstract

Cited by 77 (3 self)
 Add to MetaCart
(Show Context)
We present methods for dealing with missing variables in the context of Gaussian Processes and Support Vector Machines. This solves an important problem which has largely been ignored by kernel methods: How to systematically deal with incomplete data? Our method can also be applied to problems with partially observed labels as well as to the transductive setting where we view the labels as missing data. Our approach relies on casting kernel methods as an estimation problem in exponential families. Hence, estimation with missing variables becomes a problem of computing marginal distributions, and finding efficient optimization methods. To that extent we propose an optimization scheme which extends the Concave Convex Procedure (CCP) of Yuille and Rangarajan, and present a simplified and intuitive proof of its convergence. We show how our algorithm can be specialized to various cases in order to efficiently solve the optimization problems that arise. Encouraging preliminary experimental results on the USPS dataset are also presented. 1
Robust image segmentation using FCM with spatial constraints based on new kernelinduced distance measure
 IEEE Transactions on Systems, Man and Cybernetics, Part B
, 2004
"... Abstract Fuzzy cmeans clustering (FCM) with spatial constraints (FCM_S) is an effective algorithm suitable for image segmentation. Its effectiveness contributes not only to introduction of fuzziness for belongingness of each pixel but also to exploitation of spatial contextual information. Altho ..."
Abstract

Cited by 68 (5 self)
 Add to MetaCart
(Show Context)
Abstract Fuzzy cmeans clustering (FCM) with spatial constraints (FCM_S) is an effective algorithm suitable for image segmentation. Its effectiveness contributes not only to introduction of fuzziness for belongingness of each pixel but also to exploitation of spatial contextual information. Although the contextual information can raise its insensitivity to noise to some extent, FCM_S (1) still lacks enough robustness to noise and outliers and (2) is not suitable for revealing nonEuclidean structure of the input data due to the use of Euclidean distance (L2 norm). In this paper, to overcome the above problems, we first propose two variants, FCM_S1 and FCM_S2, of FCM_S to aim at simplifying its computation and then extend them, including
Estimating labels from label proportions
 Proceedings of the 25th Annual International Conference on Machine Learning
, 2008
"... Consider the following problem: given sets of unlabeled observations, each set with known label proportions, predict the labels of another set of observations, also with known label proportions. This problem appears in areas like ecommerce, spam filtering and improper content detection. We present ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
(Show Context)
Consider the following problem: given sets of unlabeled observations, each set with known label proportions, predict the labels of another set of observations, also with known label proportions. This problem appears in areas like ecommerce, spam filtering and improper content detection. We present consistent estimators which can reconstruct the correct labels with high probability in a uniform convergence sense. Experiments show that our method works well in practice. 1
An approach to spacecraft anomaly detection problem using kernel feature space
 in Proc. PAKDD2005: Ninth PacificAsia Conference on Knowledge Discovery and Data Mining
, 2005
"... Development of advanced anomaly detection and failure diagnosis technologies for spacecraft is a quite significant issue in the space industry, because the space environment is harsh, distant and uncertain. While several modern approaches based on qualitative reasoning, expert systems, and probabili ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
(Show Context)
Development of advanced anomaly detection and failure diagnosis technologies for spacecraft is a quite significant issue in the space industry, because the space environment is harsh, distant and uncertain. While several modern approaches based on qualitative reasoning, expert systems, and probabilistic reasoning have been developed recently for this purpose, any of them has a common difficulty in obtaining accurate and complete a priori knowledge on the space systems from human experts. A reasonable alternative to this conventional anomaly detection method is to reuse a vast amount of telemetry data which is multidimensional timeseries continuously produced from a number of system components in the spacecraft. This paper proposes a novel ”knowledgefree ” anomaly detection method for spacecraft based on Kernel Feature Space and directional distribution, which constructs a system behavior model from the past normal telemetry data from a set of telemetry data in normal operation and monitors the current system status by checking incoming data with the model. In this method, we regard anomaly phenomena as unexpected changes of causal associations in the spacecraft system, and hypothesize that the significant causal associations inside the system will appear in the form of principal component directions in a highdimensional nonlinear feature space which is constructed by a kernel function and a set of data. We have confirmed the effectiveness of the proposed anomaly detection method by applying it to the telemetry data obtained from a simulator of an orbital transfer vehicle designed to make a rendezvous maneuver with the International
Invariant Pattern Recognition by Semidefinite Programming Machines
, 2003
"... Knowledge about local invariances with respect to given pattern transformations can greatly improve the accuracy of classification. Previous approaches are either based on regularisation or on the generation of virtual (transformed) examples. We develop a new framework for learning linear classifier ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
Knowledge about local invariances with respect to given pattern transformations can greatly improve the accuracy of classification. Previous approaches are either based on regularisation or on the generation of virtual (transformed) examples. We develop a new framework for learning linear classifiers under known transformations based on semidefinite programming. We present a new learning algorithm— the Semidefinite Programming Machine (SDPM)—which is able to find a maximum margin hyperplane when the training examples are polynomial trajectories instead of single points. The solution is found to be sparse in dual variables and allows to identify those points on the trajectory with minimal realvalued output as virtual support vectors. Extensions to segments of trajectories, to more than one transformation parameter, and to learning with kernels are discussed. In experiments we use a Taylor expansion to locally approximate rotational invariance in pixel images from USPS and find improvements over known methods.