Results 1  10
of
121
Convolution Kernels on Discrete Structures
, 1999
"... We introduce a new method of constructing kernels on sets whose elements are discrete structures like strings, trees and graphs. The method can be applied iteratively to build a kernel on an infinite set from kernels involving generators of the set. The family of kernels generated generalizes the fa ..."
Abstract

Cited by 403 (0 self)
 Add to MetaCart
(Show Context)
We introduce a new method of constructing kernels on sets whose elements are discrete structures like strings, trees and graphs. The method can be applied iteratively to build a kernel on an infinite set from kernels involving generators of the set. The family of kernels generated generalizes the family of radial basis kernels. It can also be used to define kernels in the form of joint Gibbs probability distributions. Kernels can be built from hidden Markov random elds, generalized regular expressions, pairHMMs, or ANOVA decompositions. Uses of the method lead to open problems involving the theory of infinitely divisible positive definite functions. Fundamentals of this theory and the theory of reproducing kernel Hilbert spaces are reviewed and applied in establishing the validity of the method.
Clustering with Bregman Divergences
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergence ..."
Abstract

Cited by 323 (50 self)
 Add to MetaCart
(Show Context)
A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergences. The proposed algorithms unify centroidbased parametric clustering approaches, such as classical kmeans and informationtheoretic clustering, which arise by special choices of the Bregman divergence. The algorithms maintain the simplicity and scalability of the classical kmeans algorithm, while generalizing the basic idea to a very large class of clustering loss functions. There are two main contributions in this paper. First, we pose the hard clustering problem in terms of minimizing the loss in Bregman information, a quantity motivated by ratedistortion theory, and present an algorithm to minimize this loss. Secondly, we show an explicit bijection between Bregman divergences and exponential families. The bijection enables the development of an alternative interpretation of an ecient EM scheme for learning models involving mixtures of exponential distributions. This leads to a simple soft clustering algorithm for all Bregman divergences.
Diffusion kernels on graphs and other discrete input spaces
 in: Proceedings of the 19th International Conference on Machine Learning
, 2002
"... The application of kernelbased learning algorithms has, so far, largely been confined to realvalued data and a few special data types, such as strings. In this paper we propose a general method of constructing natural families of kernels over discrete structures, based on the matrix exponentiation ..."
Abstract

Cited by 194 (7 self)
 Add to MetaCart
(Show Context)
The application of kernelbased learning algorithms has, so far, largely been confined to realvalued data and a few special data types, such as strings. In this paper we propose a general method of constructing natural families of kernels over discrete structures, based on the matrix exponentiation idea. In particular, we focus on generating kernels on graphs, for which we propose a special class of exponential kernels called diffusion kernels, which are based on the heat equation and can be regarded as the discretization of the familiar Gaussian kernel of Euclidean space.
Sparseness of support vector machines
, 2003
"... Support vector machines (SVMs) construct decision functions that are linear combinations of kernel evaluations on the training set. The samples with nonvanishing coefficients are called support vectors. In this work we establish lower (asymptotical) bounds on the number of support vectors. On our w ..."
Abstract

Cited by 143 (23 self)
 Add to MetaCart
Support vector machines (SVMs) construct decision functions that are linear combinations of kernel evaluations on the training set. The samples with nonvanishing coefficients are called support vectors. In this work we establish lower (asymptotical) bounds on the number of support vectors. On our way we prove several results which are of great importance for the understanding of SVMs. In particular, we describe to which “limit” SVM decision functions tend, discuss the corresponding notion of convergence and provide some results on the stability of SVMs using subdifferential calculus in the associated reproducing kernel Hilbert space.
Diffusion kernels on graphs and other discrete structures
 In Proceedings of the ICML
, 2002
"... The application of kernelbased learning algorithms has, so far, largely been confined to realvalued data and a few special data types, such as strings. In this paper we propose a general method of constructing natural families of kernels over discrete structures, based on the matrix exponentiation ..."
Abstract

Cited by 138 (4 self)
 Add to MetaCart
(Show Context)
The application of kernelbased learning algorithms has, so far, largely been confined to realvalued data and a few special data types, such as strings. In this paper we propose a general method of constructing natural families of kernels over discrete structures, based on the matrix exponentiation idea. In particular, we focus on generating kernels on graphs, for which we propose a special class of exponential kernels, based on the heat equation, called diffusion kernels, and show that these can be regarded as the discretisation of the familiar Gaussian kernel of Euclidean space.
A comparison of the SheraliAdams, LovászSchrijver and Lasserre relaxations for 01 programming
 Mathematics of Operations Research
, 2001
"... ..."
Fast rates for support vector machines using gaussian kernels
 Ann. Statist
, 2004
"... We establish learning rates up to the order of n −1 for support vector machines with hinge loss (L1SVMs) and nontrivial distributions. For the stochastic analysis of these algorithms we use recently developed concepts such as Tsybakov’s noise assumption and local Rademacher averages. Furthermore we ..."
Abstract

Cited by 52 (7 self)
 Add to MetaCart
(Show Context)
We establish learning rates up to the order of n −1 for support vector machines with hinge loss (L1SVMs) and nontrivial distributions. For the stochastic analysis of these algorithms we use recently developed concepts such as Tsybakov’s noise assumption and local Rademacher averages. Furthermore we introduce a new geometric noise condition for distributions that is used to bound the approximation error of Gaussian kernels in terms of their widths. 1
Support Vector Machine Classification of Microarray Gene Expression Data
, 1999
"... We introduce a new method of functionally classifying genes using gene expression data from DNA microarray hybridization experiments. The method is based on the theory of support vector machines (SVMs). We describe SVMs that use different similarity metrics including a simple dot product of gene ..."
Abstract

Cited by 46 (0 self)
 Add to MetaCart
We introduce a new method of functionally classifying genes using gene expression data from DNA microarray hybridization experiments. The method is based on the theory of support vector machines (SVMs). We describe SVMs that use different similarity metrics including a simple dot product of gene expression vectors, polynomial versions of the dot product, and a radial basis function. Compared to the other SVM similarity metrics, the radial basis function SVM appears to provide superior performance in identifying sets of genes with a common function using expression data. In addition, SVM performance is compared to four standard machine learning algorithms. SVMs have many features that make them attractive for gene expression analysis, including their flexibility in choosing a similarity function, sparseness of solution when dealing with large data sets, the ability to handle large feature spaces, and the ability to identify outliers.