Results 1  10
of
27
Marginalized kernels between labeled graphs
 Proceedings of the Twentieth International Conference on Machine Learning
, 2003
"... A new kernel function between two labeled graphs is presented. Feature vectors are defined as the counts of label paths produced by random walks on graphs. The kernel computation finally boils down to obtaining the stationary state of a discretetime linear system, thus is efficiently performed by s ..."
Abstract

Cited by 194 (15 self)
 Add to MetaCart
(Show Context)
A new kernel function between two labeled graphs is presented. Feature vectors are defined as the counts of label paths produced by random walks on graphs. The kernel computation finally boils down to obtaining the stationary state of a discretetime linear system, thus is efficiently performed by solving simultaneous linear equations. Our kernel is based on an infinite dimensional feature space, so it is fundamentally different from other string or tree kernels based on dynamic programming. We will present promising empirical results in classification of chemical compounds. 1 1.
Probability product kernels
 Journal of Machine Learning Research
, 2004
"... The advantages of discriminative learning algorithms and kernel machines are combined with generative modeling using a novel kernel between distributions. In the probability product kernel, data points in the input space are mapped to distributions over the sample space and a general inner product i ..."
Abstract

Cited by 180 (9 self)
 Add to MetaCart
(Show Context)
The advantages of discriminative learning algorithms and kernel machines are combined with generative modeling using a novel kernel between distributions. In the probability product kernel, data points in the input space are mapped to distributions over the sample space and a general inner product is then evaluated as the integral of the product of pairs of distributions. The kernel is straightforward to evaluate for all exponential family models such as multinomials and Gaussians and yields interesting nonlinear kernels. Furthermore, the kernel is computable in closed form for latent distributions such as mixture models, hidden Markov models and linear dynamical systems. For intractable models, such as switching linear dynamical systems, structured meanfield approximations can be brought to bear on the kernel evaluation. For general distributions, even if an analytic expression for the kernel is not feasible, we show a straightforward sampling method to evaluate it. Thus, the kernel permits discriminative learning methods, including support vector machines, to exploit the properties, metrics and invariances of the generative models we infer from each datum. Experiments are shown using multinomial models for text, hidden Markov models for biological data sets and linear dynamical systems for time series data.
The pyramid match kernel: Efficient learning with sets of features
 Journal of Machine Learning Research
, 2007
"... In numerous domains it is useful to represent a single example by the set of the local features or parts that comprise it. However, this representation poses a challenge to many conventional machine learning techniques, since sets may vary in cardinality and elements lack a meaningful ordering. Kern ..."
Abstract

Cited by 136 (10 self)
 Add to MetaCart
(Show Context)
In numerous domains it is useful to represent a single example by the set of the local features or parts that comprise it. However, this representation poses a challenge to many conventional machine learning techniques, since sets may vary in cardinality and elements lack a meaningful ordering. Kernel methods can learn complex functions, but a kernel over unordered set inputs must somehow solve for correspondences—generally a computationally expensive task that becomes impractical for large set sizes. We present a new fast kernel function called the pyramid match that measures partial match similarity in time linear in the number of features. The pyramid match maps unordered feature sets to multiresolution histograms and computes a weighted histogram intersection in order to find implicit correspondences based on the finest resolution histogram cell where a matched pair first appears. We show the pyramid match yields a Mercer kernel, and we prove bounds on its error relative to the optimal partial matching cost. We demonstrate our algorithm on both classification and regression tasks, including object recognition, 3D human pose inference, and time of publication estimation for documents, and we show that the proposed method is accurate and significantly more efficient than current approaches.
A kernel between sets of vectors
 In International Conference on Machine Learning
, 2003
"... In various application domains, including image recognition, it is natural to represent each example as a set of vectors. With a base kernel we can implicitly map these vectors to a Hilbert space and fit a Gaussian distribution to the whole set using Kernel PCA. We define our kernel between examples ..."
Abstract

Cited by 130 (8 self)
 Add to MetaCart
(Show Context)
In various application domains, including image recognition, it is natural to represent each example as a set of vectors. With a base kernel we can implicitly map these vectors to a Hilbert space and fit a Gaussian distribution to the whole set using Kernel PCA. We define our kernel between examples as Bhattacharyya’s measure of affinity between such Gaussians. The resulting kernel is computable in closed form and enjoys many favorable properties, including graceful behavior under transformations, potentially justifying the vector set representation even in cases when more conventional representations also exist. 1.
Diffusion Kernels on Statistical Manifolds
, 2004
"... A family of kernels for statistical learning is introduced that exploits the geometric structure of statistical models. The kernels are based on the heat equation on the Riemannian manifold defined by the Fisher information metric associated with a statistical family, and generalize the Gaussian ker ..."
Abstract

Cited by 115 (9 self)
 Add to MetaCart
(Show Context)
A family of kernels for statistical learning is introduced that exploits the geometric structure of statistical models. The kernels are based on the heat equation on the Riemannian manifold defined by the Fisher information metric associated with a statistical family, and generalize the Gaussian kernel of Euclidean space. As an important special case, kernels based on the geometry of multinomial families are derived, leading to kernelbased learning algorithms that apply naturally to discrete data. Bounds on covering numbers and Rademacher averages for the kernels are proved using bounds on the eigenvalues of the Laplacian on Riemannian manifolds. Experimental results are presented for document classification, for which the use of multinomial geometry is natural and well motivated, and improvements are obtained over the standard use of Gaussian or linear kernels, which have been the standard for text classification.
Bhattacharyya and Expected Likelihood Kernels
 In Conference on Learning Theory
, 2003
"... We introduce a new class of kernels between distributions. These induce a kernel on the input... ..."
Abstract

Cited by 37 (2 self)
 Add to MetaCart
(Show Context)
We introduce a new class of kernels between distributions. These induce a kernel on the input...
The Locally Weighted Bag of Words Framework for Document Representation
"... The popular bag of words assumption represents a document as a histogram of word occurrences. While computationally efficient, such a representation is unable to maintain any sequential information. We present an effective sequential document representation that goes beyond the bag of words represen ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
The popular bag of words assumption represents a document as a histogram of word occurrences. While computationally efficient, such a representation is unable to maintain any sequential information. We present an effective sequential document representation that goes beyond the bag of words representation and its ngram extensions. This representation uses local smoothing to embed documents as smooth curves in the multinomial simplex thereby preserving valuable sequential information. In contrast to bag of words or ngrams, the new representation is able to robustly capture medium and long range sequential trends in the document. We discuss the representation and its geometric properties and demonstrate its applicability for various text processing tasks.
Neural Methods for NonStandard Data
 proceedings of the 12 th European Symposium on Artificial Neural Networks (ESANN 2004), dside pub
, 2004
"... Standard pattern recognition provides effective and noisetolerant tools for machine learning tasks; however, most approaches only deal with real vectors of a finite and fixed dimensionality. In this tutorial paper, we give an overview about extensions of pattern recognition towards nonstandard ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
(Show Context)
Standard pattern recognition provides effective and noisetolerant tools for machine learning tasks; however, most approaches only deal with real vectors of a finite and fixed dimensionality. In this tutorial paper, we give an overview about extensions of pattern recognition towards nonstandard data which are not contained in a finite dimensional space, such as strings, sequences, trees, graphs, or functions. Two major directions can be distinguished in the neural networks literature: models can be based on a similarity measure adapted to nonstandard data, including kernel methods for structures as a very prominent approach, but also alternative metric based algorithms and functional networks; alternatively, nonstandard data can be processed recursively within supervised and unsupervised recurrent and recursive networks and fully recurrent systems.
A New SVM Approach to Speaker Identification and Verification Using Probabilistic Distance Kernels
 PROC. EUROSPEECH’03
, 2004
"... ..."
Hyperplane Margin Classifiers on the Multinomial Manifold
 In Proc. of the 21st International Conference on Machine Learning
, 2004
"... The assumptions behind linear classifiers for categorical data are examined and reformulated in the context of the multinomial manifold, the simplex of multinomial models furnished with the Riemannian structure induced by the Fisher information. ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
The assumptions behind linear classifiers for categorical data are examined and reformulated in the context of the multinomial manifold, the simplex of multinomial models furnished with the Riemannian structure induced by the Fisher information.