Results 1  10
of
30
Fisher Discriminant Analysis With Kernels
, 1999
"... A nonlinear classification technique based on Fisher's discriminant is proposed. The main ingredient is the kernel trick which allows the efficient computation of Fisher discriminant in feature space. The linear classification in feature space corresponds to a (powerful) nonlinear decision f ..."
Abstract

Cited by 358 (15 self)
 Add to MetaCart
A nonlinear classification technique based on Fisher's discriminant is proposed. The main ingredient is the kernel trick which allows the efficient computation of Fisher discriminant in feature space. The linear classification in feature space corresponds to a (powerful) nonlinear decision function in input space. Large scale simulations demonstrate the competitiveness of our approach.
Kernel PCA and DeNoising in Feature Spaces
 ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 11
, 1999
"... Kernel PCA as a nonlinear feature extractor has proven powerful as a preprocessing step for classification algorithms. But it can also be considered as a natural generalization of linear principal component analysis. This gives rise to the question how to use nonlinear features for data compress ..."
Abstract

Cited by 133 (13 self)
 Add to MetaCart
Kernel PCA as a nonlinear feature extractor has proven powerful as a preprocessing step for classification algorithms. But it can also be considered as a natural generalization of linear principal component analysis. This gives rise to the question how to use nonlinear features for data compression, reconstruction, and denoising, applications common in linear PCA. This is a nontrivial task, as the results provided by kernel PCA live in some high dimensional feature space and need not have preimages in input space. This work presents ideas for finding approximate preimages, focusing on Gaussian kernels, and shows experimental results using these preimages in data reconstruction and denoising on toy examples as well as on real world data.
Prior Knowledge in Support Vector Kernels
, 1998
"... We explore methods for incorporating prior knowledge about a problem at hand in Support Vector learning machines. We show that both invariances under group transformations and prior knowledge about locality in images can be incorporated by constructing appropriate kernel functions. ..."
Abstract

Cited by 105 (13 self)
 Add to MetaCart
We explore methods for incorporating prior knowledge about a problem at hand in Support Vector learning machines. We show that both invariances under group transformations and prior knowledge about locality in images can be incorporated by constructing appropriate kernel functions.
A Review of Kernel Methods in Machine Learning
, 2006
"... We review recent methods for learning with positive definite kernels. All these methods formulate learning and estimation problems as linear tasks in a reproducing kernel Hilbert space (RKHS) associated with a kernel. We cover a wide range of methods, ranging from simple classifiers to sophisticate ..."
Abstract

Cited by 40 (3 self)
 Add to MetaCart
(Show Context)
We review recent methods for learning with positive definite kernels. All these methods formulate learning and estimation problems as linear tasks in a reproducing kernel Hilbert space (RKHS) associated with a kernel. We cover a wide range of methods, ranging from simple classifiers to sophisticated methods for estimation with structured data.
A kernel method for the two sample problem
 ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 19
, 2007
"... We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert ..."
Abstract

Cited by 39 (12 self)
 Add to MetaCart
We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS). We present two tests based on large deviation bounds for the test statistic, while a third is based on the asymptotic distribution of this statistic. The test statistic can be computed in quadratic time, although efficient linear time approximations are available. Several classical metrics on distributions are recovered when the function space used to compute the difference in expectations is allowed to be more general (eg. a Banach space). We apply our twosample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests.
Kernel PCA Pattern Reconstruction via Approximate PreImages
, 1998
"... Algorithms based on Mercer kernels construct their solutions in terms of expansions in a highdimensional feature space F . Previous work has shown that all algorithms which can be formulated in terms of dot products in F can be performed using a kernel without explicitly working in F . The list of ..."
Abstract

Cited by 39 (2 self)
 Add to MetaCart
Algorithms based on Mercer kernels construct their solutions in terms of expansions in a highdimensional feature space F . Previous work has shown that all algorithms which can be formulated in terms of dot products in F can be performed using a kernel without explicitly working in F . The list of such algorithms includes support vector machines and nonlinear kernel principal component extraction. So far, however, it did not include the reconstruction of patterns from their largest nonlinear principal components, a technique which is common practice in linear principal component analysis. The present work proposes an idea for approximately performing this task. As an illustrative example, an application to the denoising of data clusters is presented. 1 Kernels and Feature Spaces A Mercer kernel is a function k(x; y) which for all data sets fx 1 ; : : : ; x ` g ae R N gives rise to a positive (not necessarily definite) matrix K ij := k(x i ; x j ) [4]. One can show that using k ins...
Binetcauchy kernels on dynamical systems and its application to the analysis of dynamic scenes
 International Journal of Computer Vision
, 2005
"... Abstract. We derive a family of kernels on dynamical systems by applying the BinetCauchy theorem to trajectories of states. Our derivation provides a unifying framework for all kernels on dynamical systems currently used in machine learning, including kernels derived from the behavioral framework, ..."
Abstract

Cited by 33 (12 self)
 Add to MetaCart
(Show Context)
Abstract. We derive a family of kernels on dynamical systems by applying the BinetCauchy theorem to trajectories of states. Our derivation provides a unifying framework for all kernels on dynamical systems currently used in machine learning, including kernels derived from the behavioral framework, diffusion processes, marginalized kernels, kernels on graphs, and the kernels on sets arising from the subspace angle approach. In the case of linear timeinvariant systems, we derive explicit formulae for computing the proposed BinetCauchy kernels by solving Sylvester equations, and relate the proposed kernels to existing kernels based on cepstrum coefficients and subspace angles. Besides their theoretical appeal, these kernels can be used efficiently in the comparison of video sequences of dynamic scenes that can be modeled as the output of a linear timeinvariant dynamical system. One advantage of our kernels is that they take the initial conditions of the dynamical systems into account. As a first example, we use our kernels to compare video sequences of dynamic textures. As a second example, we apply our kernels to the problem of clustering short clips of a movie. Experimental evidence shows superior performance of our kernels. Keywords: BinetCauchy theorem, ARMA models and dynamical systems, Sylvester
Hash Kernels for Structured Data
, 2009
"... We propose hashing to facilitate efficient kernels. This generalizes previous work using sampling and we show a principled way to compute the kernel matrix for data streams and sparse feature spaces. Moreover, we give deviation bounds from the exact kernel matrix. This has applications to estimation ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
We propose hashing to facilitate efficient kernels. This generalizes previous work using sampling and we show a principled way to compute the kernel matrix for data streams and sparse feature spaces. Moreover, we give deviation bounds from the exact kernel matrix. This has applications to estimation on strings and graphs.
Classifiers in Almost Empty Spaces
 In 15th International Conference on Pattern Recognition
, 2000
"... Recent developments in defining and training statistical classifiers make it possible to build reliable classifiers in very small sample size problems. Using these techniques advanced problems may be tackled, such as pixel based image recognition and dissimilarity based object classification. It wil ..."
Abstract

Cited by 24 (7 self)
 Add to MetaCart
(Show Context)
Recent developments in defining and training statistical classifiers make it possible to build reliable classifiers in very small sample size problems. Using these techniques advanced problems may be tackled, such as pixel based image recognition and dissimilarity based object classification. It will be explained and illustrated how recognition systems based on support vector machines and subspace classifiers circumvent the curse of dimensionality, and even may find nonlinear decision boundaries for small training sets represented in Hilbert space.
Hash kernels
 Proc. Intl. Workshop on Artificial Intelligence and Statistics. Society for Artificial Intelligence and Statistics
, 2009
"... We propose hashing to facilitate efficient kernels. This generalizes previous work using sampling and we show a principled way to compute the kernel matrix for data streams and sparse feature spaces. Moreover, we give deviation bounds from the exact kernel matrix. This has applications to estimation ..."
Abstract

Cited by 22 (4 self)
 Add to MetaCart
(Show Context)
We propose hashing to facilitate efficient kernels. This generalizes previous work using sampling and we show a principled way to compute the kernel matrix for data streams and sparse feature spaces. Moreover, we give deviation bounds from the exact kernel matrix. This has applications to estimation on strings and graphs. 1