## Convolution Kernels on Discrete Structures (1999)

Citations: | 403 - 0 self |

### BibTeX

@MISC{Haussler99convolutionkernels,

author = {David Haussler},

title = { Convolution Kernels on Discrete Structures},

year = {1999}

}

### Years of Citing Articles

### OpenURL

### Abstract

We introduce a new method of constructing kernels on sets whose elements are discrete structures like strings, trees and graphs. The method can be applied iteratively to build a kernel on an infinite set from kernels involving generators of the set. The family of kernels generated generalizes the family of radial basis kernels. It can also be used to define kernels in the form of joint Gibbs probability distributions. Kernels can be built from hidden Markov random elds, generalized regular expressions, pair-HMMs, or ANOVA decompositions. Uses of the method lead to open problems involving the theory of infinitely divisible positive definite functions. Fundamentals of this theory and the theory of reproducing kernel Hilbert spaces are reviewed and applied in establishing the validity of the method.

### Citations

9946 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ..., that is, they can be accomplished without ever explicitly representing the feature vector fOE n (x)g n1 , relying instead only on indirect computations of the kernel K(x; y) or the distance d(x; y) =-=[28, 31, 2, 13, 23, 30, 17]-=- (see also the bibliography at http://svm.first.gmd.de.) The kernels and corresponding distance functions we construct are suitable for all such methods. In particular, there is a 1-1 correspondence b... |

4597 | A tutorial on hidden Markov models and selected applications in speech processing
- Rabiner
- 1989
(Show Context)
Citation Context ... similar objects have the same length. This occurs, for example, when the strings consist of amino acids representing proteins, nucleic acids representing genes, or phonemes representing spoken words =-=[26, 4, 9, 24]-=-. In these contexts, some objects may be missing components that other similar objects have. However, we can align any two object strings so that their corresponding components are adjacent, using a s... |

4144 |
Introduction to Automata Theory, Languages, and Computation
- Hopcroft, Ullman
- 1979
(Show Context)
Citation Context ...[ r1 L (r) : Finally, the regular languages are defined to be the smallest set of languages that contain ffflg and fag for all letters a 2 A, and are closed under union, concatenation and Kleene star =-=[14]-=-. The operations of convex combination, simple convolution, and fl-iterated convolution may be used to define a class of probability distributions on regular languages called regular probability distr... |

4055 |
Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images
- Geman, Geman
- 1984
(Show Context)
Citation Context ...ion on X \Theta X that we call a Gibbs kernel (Section 3.5). These kernels may have promising applications in areas where structures can be modeled generatively by Hidden Markov Random Fields (HMRFs) =-=[12, 18, 5]-=-. Convolution kernels can be applied iteratively to build a kernel on a infinite set from kernels involving generators of the set. We introduce a class of generalized regular expressions to define ker... |

2497 | A tutorial on support vector machines for pattern recognition - Burges - 1998 |

2326 |
An Introduction to Probability Theory and its Applications, volume 1
- Feller
- 1968
(Show Context)
Citation Context ...ion of distributions with this relation R corresponds 10 to multiplication of generating functions. By differentiating the generating function, one obtains the moments of the distribution (see, e.g., =-=[6]-=-). Other kinds of convolutions can be used to represent combinatorial counting problems, because if g d (x d ) = 1 for all x d , then g 1 ? \Delta \Delta \Delta ? g D (x) is the cardinality of R \Gamm... |

1418 |
Spline Models for Observational Data
- Wahba
- 1990
(Show Context)
Citation Context ...e kernels and corresponding distance functions we construct are suitable for all such methods. In particular, there is a 1-1 correspondence between kernels and Gaussian processes defined on the set X =-=[3, 32, 21]-=-. We do not pursue this avenue in this paper, but the kernels we develop can be plugged directly into Gaussian process methods. Convolution kernels are obtained from other kernels by a certain sum ove... |

1165 | Graphical Models
- Lauritzen
- 1996
(Show Context)
Citation Context ...ion on X \Theta X that we call a Gibbs kernel (Section 3.5). These kernels may have promising applications in areas where structures can be modeled generatively by Hidden Markov Random Fields (HMRFs) =-=[12, 18, 5]-=-. Convolution kernels can be applied iteratively to build a kernel on a infinite set from kernels involving generators of the set. We introduce a class of generalized regular expressions to define ker... |

933 | Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids
- Durbin
- 1998
(Show Context)
Citation Context ...strings that are derived from a common ancestor under the operations of insertion, deletion and substitution of letters (Section 4.4). This and similar kernels are related to the pair-HMMs defined in =-=[4]. This pro-=-vides a new angle on the old field of syntactic pattern recognition, developed by Kung-Sun Fu and his colleagues [9, 10, 11]. Attempts to control the "width" parameter in generalized radial ... |

759 |
Algebra and its Applications
- Strang, Linear
- 1980
(Show Context)
Citation Context ...j = K(x i ; x j ) is positive definite, i.e. P ij c i c j K ijs0 for all c 1 ; : : : ; c N 2 !. Equivalently, a symmetric matrix is positive definite if all its eigenvalues are nonnegative, see, e.g. =-=[29]-=-. 1 Many authors consider the more general case of complex-valued kernels. The relationship between the definitions used for that case and the ones used here for the real case is discussed in [1], sec... |

503 |
Real analysis and probability
- Dudley
- 2002
(Show Context)
Citation Context ...e kernels and corresponding distance functions we construct are suitable for all such methods. In particular, there is a 1-1 correspondence between kernels and Gaussian processes defined on the set X =-=[3, 32, 21]-=-. We do not pursue this avenue in this paper, but the kernels we develop can be plugged directly into Gaussian process methods. Convolution kernels are obtained from other kernels by a certain sum ove... |

455 | The Geometry of Graphs and some of Its Algorithmic Applications
- Linial, London, et al.
- 1995
(Show Context)
Citation Context ...fies the triangle inequality d(x; y)sd(x; z) + d(z; y), and has the property that d(x; x) = 0. However, for many pattern recognition applications, this is not sufficient for d to be a useful distance =-=[20]-=-. For a distance d to be useful, we need to actually embed the metric space (X; d) in a finite dimensional Euclidean space ! N , or in the space of all infinite square summable sequences l 2 , via som... |

421 |
Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison
- Sankoff, Kruskal
- 1983
(Show Context)
Citation Context ... similar objects have the same length. This occurs, for example, when the strings consist of amino acids representing proteins, nucleic acids representing genes, or phonemes representing spoken words =-=[26, 4, 9, 24]-=-. In these contexts, some objects may be missing components that other similar objects have. However, we can align any two object strings so that their corresponding components are adjacent, using a s... |

415 | Exploiting generative models in discriminative classifiers
- Jaakkola, Haussler
- 1998
(Show Context)
Citation Context ...ides a way of using the generative probability model inherent in a HMRF to define a notion of similarity between related structures. This idea will be further developed in a separate paper. (See also =-=[16, 15]-=- for an alternate way to do this.) 4 Iterated convolution kernels and generalized regular expressions When X is countably infinite and X d = X for 1sdsD, as in the examples of kernels for strings and ... |

219 |
Syntactic Pattern Recognition and Applications
- Fu
- 1982
(Show Context)
Citation Context ...on 4.4). This and similar kernels are related to the pair-HMMs defined in [4]. This provides a new angle on the old field of syntactic pattern recognition, developed by Kung-Sun Fu and his colleagues =-=[9, 10, 11]. Attempts-=- to control the "width" parameter in generalized radial basis kernels derived from convolution kernels lead us to the important notion of infinitely divisible kernels, which we review (Secti... |

216 | An equivalence between sparse approximation and support vector machines
- Girosi
- 1998
(Show Context)
Citation Context ..., that is, they can be accomplished without ever explicitly representing the feature vector fOE n (x)g n1 , relying instead only on indirect computations of the kernel K(x; y) or the distance d(x; y) =-=[28, 31, 2, 13, 23, 30, 17]-=- (see also the bibliography at http://svm.first.gmd.de.) The kernels and corresponding distance functions we construct are suitable for all such methods. In particular, there is a 1-1 correspondence b... |

170 | Using the Fisher kernel method to detect remote protein homologies
- Jaakkola, Diekhans, et al.
- 1999
(Show Context)
Citation Context ...ides a way of using the generative probability model inherent in a HMRF to define a notion of similarity between related structures. This idea will be further developed in a separate paper. (See also =-=[16, 15]-=- for an alternate way to do this.) 4 Iterated convolution kernels and generalized regular expressions When X is countably infinite and X d = X for 1sdsD, as in the examples of kernels for strings and ... |

162 | Support vector machines, reproducing kernel Hilbert spaces and their randomized GACV
- Wahba
- 1999
(Show Context)
Citation Context ...s to the important notion of infinitely divisible kernels, which we review (Section 6). Some open problems are mentioned in this regard. We also review the theory of reproducing kernel Hilbert spaces =-=[22, 32, 33]-=- (Section 7), and use it to derive several results mentioned in earlier sections. 2 Convolution kernels 2.1 Kernels Let X be a set and K : X \Theta X ! !, where ! denotes the real numbers 1 and \Theta... |

136 | Comparing support vector machines with gaussian kernels to radial basis function classiers
- Scholkopf, Sung, et al.
- 1997
(Show Context)
Citation Context ... the component x d of x. These features are then used to define a kernel K that in fact maps x implicitly into an infinite dimensional feature space. Such kernels have proven quite useful in practice =-=[27]-=-. Continuing with Example 1 from Section 2, using the same primitive features ff d (x d ) : 1sdsDg, we can define the simple exponential kernel K 1 ? \Delta \Delta \Delta ? KD (x; y) = e P D d=1 f d (... |

121 |
Introduction to Gaussian process
- Mackay
- 1998
(Show Context)
Citation Context ...e kernels and corresponding distance functions we construct are suitable for all such methods. In particular, there is a 1-1 correspondence between kernels and Gaussian processes defined on the set X =-=[3, 32, 21]-=-. We do not pursue this avenue in this paper, but the kernels we develop can be plugged directly into Gaussian process methods. Convolution kernels are obtained from other kernels by a certain sum ove... |

109 |
Harmonic Analysis on Semigroups. Theory of Positive De¯nite and Related Functions
- Berg, Christensen, et al.
- 1984
(Show Context)
Citation Context ....g. [29]. 1 Many authors consider the more general case of complex-valued kernels. The relationship between the definitions used for that case and the ones used here for the real case is discussed in =-=[1]-=-, section 1.6, page 68. Virtually all of the results extend naturally to the complex case. 3 It is readily verified that if each x 2 X is represented by the sequence 2 OE(x) = fOE n (x)g n1 such that ... |

109 | Probabilistic Kernel Regression Models
- Jaakkola, Haussler
- 1999
(Show Context)
Citation Context ..., that is, they can be accomplished without ever explicitly representing the feature vector fOE n (x)g n1 , relying instead only on indirect computations of the kernel K(x; y) or the distance d(x; y) =-=[28, 31, 2, 13, 23, 30, 17]-=- (see also the bibliography at http://svm.first.gmd.de.) The kernels and corresponding distance functions we construct are suitable for all such methods. In particular, there is a 1-1 correspondence b... |

107 | Ridge regression learning algorithm in dual variables
- Saunders, Gammerman, et al.
- 1998
(Show Context)
Citation Context |

53 |
Linear algebra and its applications,” Harcourt Brace Jovanovish Inc
- Strang
- 1988
(Show Context)
Citation Context ...matrix K de ned by Kij = K(xi�xj) is positive de nite, i.e. P ij cicjKij 0 for all c1�:::�cN 2<. Equivalently, a symmetric matrix is positive de nite if all its eigenvalues are nonnegative, see, e.g. =-=[29]-=-. 1 Many authors consider the more general case of complex-valued kernels. The relationship between the de nitions used for that case and the ones used here for the real case is discussed in [1], sect... |

44 |
Grammatical inference: Introduction and survey (part 1
- Fu, Booth
- 1975
(Show Context)
Citation Context ...on 4.4). This and similar kernels are related to the pair-HMMs defined in [4]. This provides a new angle on the old field of syntactic pattern recognition, developed by Kung-Sun Fu and his colleagues =-=[9, 10, 11]. Attempts-=- to control the "width" parameter in generalized radial basis kernels derived from convolution kernels lead us to the important notion of infinitely divisible kernels, which we review (Secti... |

43 | A sparse representation for function approximation
- Poggio, Girosi
- 1998
(Show Context)
Citation Context |

40 | Global self-organization of all known protein sequences reveals inherent biological signatures
- Linial, Linial, et al.
- 1997
(Show Context)
Citation Context ... the sense that the Euclidean distance between OE(x) and OE(y) is close to the original distance d(x; y) for all x and y [20]. They apply these results to the problem of classifying protein sequences =-=[19]-=-. However, if (X; d) can be embedded in l 2 , as mentioned in the introduction, we can still take advantage of most of the classical pattern recognition, clustering, regression and classification meth... |

10 |
On fractional Hadamard powers of positive definite matrices
- Fitzgerald, Horn
(Show Context)
Citation Context ... the main theorem of which is that if a real symmetric n \Theta n matrix K is positive and positive definite, then the fractional Schur power K t = fK t ij g is positive definite for all tsn \Gamma 2 =-=[8]-=-. (This result was rediscovered 19 years later [25].) Let 1 denote the all 1s vector and n denote (1; 2; : : : ; n) T . The example Fitzgerald and Horn supply to show this bound is tight is a matrix o... |

2 |
Hilbert Space Methods
- Mate
- 1989
(Show Context)
Citation Context ...s to the important notion of infinitely divisible kernels, which we review (Section 6). Some open problems are mentioned in this regard. We also review the theory of reproducing kernel Hilbert spaces =-=[22, 32, 33]-=- (Section 7), and use it to derive several results mentioned in earlier sections. 2 Convolution kernels 2.1 Kernels Let X be a set and K : X \Theta X ! !, where ! denotes the real numbers 1 and \Theta... |

1 |
Positive powers of positive positive definite matrices
- Rosen
- 1996
(Show Context)
Citation Context ...tric n \Theta n matrix K is positive and positive definite, then the fractional Schur power K t = fK t ij g is positive definite for all tsn \Gamma 2 [8]. (This result was rediscovered 19 years later =-=[25]-=-.) Let 1 denote the all 1s vector and n denote (1; 2; : : : ; n) T . The example Fitzgerald and Horn supply to show this bound is tight is a matrix of the form M ffi = 11 T + ffinn T . They show that ... |

1 | On fractional hadamard powers of positive de nite matrices - Fitzgerald, Horn - 1977 |

1 |
Positive powers of positive positive de nite matrices
- Rosen
- 1996
(Show Context)
Citation Context ...t if a real symmetric n n matrix K is positive and positive de nite, then the fractional Schur power Kt = fKt ijg is positive de nite for all t n ; 2 [8]. (This result was rediscovered 19 years later =-=[25]-=-.) Let 1 denote the all 1s vector and n denote (1� 2�:::�n) T . The example Fitzgerald and Horn supply to show this bound is tight is a matrix of the form M = 11T + nnT . They show that for n 3 and su... |