## Positive Definite Rational Kernels (2003)

### Cached

### Download Links

- [www.cs.berkeley.edu]
- [www.research.att.com]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of The 16th Annual Conference on Computational Learning Theory (COLT 2003 |

Citations: | 20 - 6 self |

### BibTeX

@INPROCEEDINGS{Cortes03positivedefinite,

author = {Corinna Cortes and Patrick Haffner and Mehryar Mohri},

title = {Positive Definite Rational Kernels},

booktitle = {In Proceedings of The 16th Annual Conference on Computational Learning Theory (COLT 2003},

year = {2003},

pages = {41--56},

publisher = {Springer}

}

### OpenURL

### Abstract

### Citations

8980 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ... to use the full output weighted automata which contain the correct result in most cases. Kernel methods [13] are widely used in statistical learning techniques such as Support Vector Machines (SVMs) =-=[2,4,14]-=- due to their computational efficiency in high-dimensional feature spaces. Recently, a general kernel frameworksSemiring Set ⊕ ⊗ 0 1 Boolean {0,1} ∨ ∧ 0 1 Probability R+ + × 0 1 Log R ∪ {−∞,+∞} ⊕log +... |

2171 | Support-vector networks
- Cortes, Vapnik
- 1995
(Show Context)
Citation Context ... to use the full output weighted automata which contain the correct result in most cases. Kernel methods [13] are widely used in statistical learning techniques such as Support Vector Machines (SVMs) =-=[2,4,14]-=- due to their computational efficiency in high-dimensional feature spaces. Recently, a general kernel frameworksSemiring Set ⊕ ⊗ 0 1 Boolean {0,1} ∨ ∧ 0 1 Probability R+ + × 0 1 Log R ∪ {−∞,+∞} ⊕log +... |

2028 | Learning with Kernels
- Scholkopf, Smola
- 2002
(Show Context)
Citation Context ...ill too high in many tasks to rely only on their one-best output, thus it is preferable instead to use the full output weighted automata which contain the correct result in most cases. Kernel methods =-=[13]-=- are widely used in statistical learning techniques such as Support Vector Machines (SVMs) [2,4,14] due to their computational efficiency in high-dimensional feature spaces. Recently, a general kernel... |

1291 | A training algorithm for optimal margin classifiers
- Boser, Guyon, et al.
(Show Context)
Citation Context ...is a weighted transducer denoted by T1 ◦ T2 when the sum: [T1 ◦ T2](x, y) = � z∈Σ ∗ is well-defined and in K for all x ∈ Σ ∗ and y ∈ ∆ ∗ [7]. 3 Rational Kernels - Definition [T1 ](x, z) ⊗ [T2 ](z, y) =-=(2)-=- Definition 3. A kernel K is said to be rational if there exist a weighted transducer T = (Σ, ∆, Q, I, F, E, λ, ρ) over the semiring K and a function ψ : K → R such that for all x ∈ Σ ∗ and y ∈ ∆ ∗ : ... |

1201 |
Binary codes capable of correcting deletions, insertions, and reversals
- Levenshtein
- 1966
(Show Context)
Citation Context ...es the relationships between several families of kernels or similarities measures and rational kernels. 5.1 Edit-Distance A common similarity measure in many applications is that of the edit-distance =-=[9]-=-. We denote by de(x, y) the edit-distance between two strings x and y over the alphabet Σ with cost 1 assigned to all edit operations. Proposition 4. Let Σ be a non-empty finite alphabet and let de be... |

830 | Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids
- Durbin, Eddy, et al.
- 1998
(Show Context)
Citation Context ...miring, the sum is well-defined for all A, B, and T representing probability distributions. When K(A, B) is defined, Equation 4 can be equivalently written as: ⎛ K(A, B) = ψ ⎝ � ⎞ [A ◦ T ◦ B](x, y) ⎠ =-=(5)-=- (x,y)∈Σ ∗ ×∆ ∗ A general algorithm for computing rational kernels efficiently was given in [3]. It is based on the composition of weighted transducers and a general shortestdistance algorithm in a se... |

385 | Watkins,C.,Text classification using string kernel
- Lodhi, Cristianini, et al.
- 2001
(Show Context)
Citation Context ...general, it is an arbitrary function mapping K to R. Figure 1 shows an example of a transducer over the probability semiring corresponding to the gappy n-gram kernel with decay factor λ as defined by =-=[10]-=-. Such gappy n-gram kernels are rational kernels [3]. Rational kernels can be naturally extended to kernels over weighted automata. Let A be a weighted automaton defined over the semiring K and the al... |

368 | Convolution Kernels on Discrete Structures
- Haussler
- 1999
(Show Context)
Citation Context ...empotent semirings. We also study the relationship between rational kernels and some commonly used string kernels or similarity measures such as the edit-distance, the convolution kernels of Haussler =-=[6]-=-, and some string kernels used in the context of computational biology [8]. We show that these kernels are all specific instances of rational kernels. In each case, we explicitly describe the correspo... |

126 | Mismatch string kernels for SVM protein classifiaction
- Leslie, Eskin, et al.
- 2003
(Show Context)
Citation Context ...s and some commonly used string kernels or similarity measures such as the edit-distance, the convolution kernels of Haussler [6], and some string kernels used in the context of computational biology =-=[8]-=-. We show that these kernels are all specific instances of rational kernels. In each case, we explicitly describe the corresponding weighted transducer. These transducers are often simple and efficien... |

122 | Dynamic alignment kernels
- Watkins
- 1999
(Show Context)
Citation Context ...s As mentioned before, given a set X and a distance or dis-similarity measure d : X × X → R+, a common method used to define a kernel K is the following. For all x, y ∈ X, K(x, y) = exp(−td 2 (x, y)) =-=(15)-=-swhere t > 0 is some constant typically used for normalization. Gaussian kernels are defined in this way. However, such kernels K are not necessarily positive definite, e.g., for X = R, d(x, y) = |x −... |

80 | The kernel trick for distances
- Scholkopf
- 2000
(Show Context)
Citation Context ... function defined by: K ′ (x, y) = K(x, x0) + K(y, x0) − K(x, y) − K(x0, x0) (17) Then K is negative definite iff K ′ is positive definite. 3 Many of the results described by [1] are also included in =-=[12]-=- with the terminology of conditionally positive definite instead of negative definite kernels. We adopt the original terminology used by [1].sb:ε/1 a:ε/1 ε:b/1 ε:a/1 b:a/1 a:b/1 b:b/0 a:a/0 0/0 b:ε/1 ... |

25 |
Edit-distance of weighted automata: General definitions and algorithms
- Mohri
- 2003
(Show Context)
Citation Context ... PDS kernel, and (2): de is a NDS kernel iff |Σ| = 1. Proof. The edit-distance between two strings, or weighted automata, can be represented by a simple weighted transducer over the tropical semiring =-=[11]-=-. Since the edit-distance is symmetric, this shows that de is a symmetric rational kernel. Figure 2(a) shows the corresponding transducer when the alphabet is Σ = {a, b}. The cost of the alignment bet... |

7 |
Mehryar Mohri. Rational kernels
- Cortes, Haffner
- 2002
(Show Context)
Citation Context ...e −y ). based on weighted transducers or rational relations, rational kernels, was introduced to extend kernel methods to the analysis of variable-length sequences or more generally weighted automata =-=[3]-=-. It was shown that there are general and efficient algorithms for computing rational kernels. Rational kernels have been successfully used for applications such as spoken-dialog classification. Not a... |

2 |
Harmonic Analysis on Semigroups. Springer-Verlag: Berlin-New
- Berg, Christensen, et al.
- 1984
(Show Context)
Citation Context ...ernels. Rational kernels have been successfully used for applications such as spoken-dialog classification. Not all rational kernels are positive definite, or equivalently verify the Mercer condition =-=[1]-=-, a condition that guarantees the convergence of discriminant classification algorithms such as SVMs. This motivates the study undertakensSemiring Set \Phis\Omegas0 1 Boolean f0; 1g . ^ 0 1 Probabilit... |