#### DMCA

## Sequence Classification in the Jensen-Shannon Embedding

### Citations

2781 | Learning with kernels
- Scholkopf, Smola
- 2002
(Show Context)
Citation Context ...erg, 1938), the JS divergence induces an embedding of the distributions in a real Hilbert space. The Jensen-Shannon kernel is defined as the dot product in this embedding. Schölkopf (Schölkopf, 2000; =-=Schölkopf & Smola, 2002-=-) has shown that conditionally positive definite (CPD) kernels (for instance obtained by taking theSequence Classification in the Jensen-Shannon Embedding opposite of CND distances) can directly be u... |

1425 |
Multivariate Analysis
- MARDIA, KENT, et al.
- 1979
(Show Context)
Citation Context ...distances) can directly be used in translation invariant learning algorithms. The present paper describes a more general technique based on the multidimensional scaling theory (MDS) (Cox & Cox, 2001; =-=Mardia et al., 1979-=-) to compute a true positive definite (PD) kernel from any CND distance. In contrast to CPD kernels, the resulting kernel can be used in any kernel-based algorithm, no matter whether the learning meth... |

1221 |
Kernel Methods for Pattern Analysis
- SHAWE-TAYLOR, CRISTIANINI
- 2004
(Show Context)
Citation Context ...of features, i.e. of empirical distributions, extracted from the sequences. The first are the joint distributions of N-grams. These features have been successfully applied to real-world applications (=-=Shawe-Taylor & Cristianini, 2004-=-), however, they can fail to model long-range dependencies or complex time-dependent dynamics. Rather than considering local features like N-grams, an alternative approach relies on dynamical features... |

650 | Divergence measures based on the Shannon entropy
- Lin
- 1991
(Show Context)
Citation Context ...d on differences between probabilities while it is often more relevant to use probability (log-)ratios. In this context, measures based on the Shannon entropy such as the Kullback-Leibler divergence (=-=Lin, 1991-=-) and the Jensen-Shannon divergence (Lin, 1991) are commonly used. The Jensen-Shannon (JS) divergence has several advantages over the Kullback-Leibler divergence. In particular, in this paper we explo... |

430 | Improved backing-off for m-gram language modeling - Kneser, Ney - 1995 |

393 |
Functions of Positive and Negative Type and their Connection with the Theory of
- Mercer
- 1909
(Show Context)
Citation Context ...D. Proof. If d(., .) is CND, the following inequalites holds −2 (Hz) T A(Hz) ≤ 0 ∀z ∈ R n ⇐⇒ z T (HAH)z ≥ 0 ∀z ∈ R n which explicitely defines K = HAH as a PD matrix. According to the Mercer theorem (=-=Mercer, 1909-=-), K is a dot product matrix for a set of n points embedded into a Hilbert space: kij = 〈φ(xi), φ(xj)〉 for all 1 ≤ i, j ≤ n. Let us show that the distance between two points xi, xj in this embedding i... |

191 | Metric spaces and positive definite functions
- Schoenberg
- 1938
(Show Context)
Citation Context ... in this paper we exploit the fact that the JS divergence is the square of a true metric distance and that it is a conditionally negative definite (CND) function. According to the Schoenberg theorem (=-=Schoenberg, 1938-=-), the JS divergence induces an embedding of the distributions in a real Hilbert space. The Jensen-Shannon kernel is defined as the dot product in this embedding. Schölkopf (Schölkopf, 2000; Schölkopf... |

56 | Graph kernels for chemical informatics - Ralaivola, Swamidass, et al. - 2005 |

37 | Bhattacharyya and Expected Likelihood Kernels
- Jebara, Kondor
- 2003
(Show Context)
Citation Context ...ce in the Gaussian kernel. This kernel, however, does not correspond to the dot product in the JS embedding and, as the Gaussian kernel, it requires to tune a hyperparameter. An alternative approach (=-=Jebara & Kondor, 2003-=-) relies on building a generative model for each instance. The kernel value for two instances is obtained by integrating the product of the two corresponding generative models. This technique is more ... |

20 | Diffusion of context and credit information in markovian models - Bengio, Frasconi - 1995 |

19 | Frequency concepts and pattern detection for the analysis of motifs in networks - Schreiber, Schwöbbermeyer |

7 | Inducing hidden markov models to model longterm dependencies - Callut, Dupont - 2005 |

3 |
The kernel trick for distances. NIPS
- Schölkopf
- 2001
(Show Context)
Citation Context ... theorem (Schoenberg, 1938), the JS divergence induces an embedding of the distributions in a real Hilbert space. The Jensen-Shannon kernel is defined as the dot product in this embedding. Schölkopf (=-=Schölkopf, 2000-=-; Schölkopf & Smola, 2002) has shown that conditionally positive definite (CPD) kernels (for instance obtained by taking theSequence Classification in the Jensen-Shannon Embedding opposite of CND dis... |

1 | Sequence discrimination using phase-type distributions
- Callut, Dupont
- 2006
(Show Context)
Citation Context ...xt occurrence of w after having observed v. The distribution of these measures forms the First Passage Time (FPT) dynamics of a sequential process with respect to the feature (v, w). A previous work (=-=Callut & Dupont, 2006-=-) proposed to model the FPT with phase-type distributions. Two classifiers based on phase-type distributions were presented: (i) a maximum a posteriori classifier and (ii) a SVM with a marginalization... |

1 |
is available from the UCI repository
- Cox, Cox
- 2001
(Show Context)
Citation Context ... opposite of CND distances) can directly be used in translation invariant learning algorithms. The present paper describes a more general technique based on the multidimensional scaling theory (MDS) (=-=Cox & Cox, 2001-=-; Mardia et al., 1979) to compute a true positive definite (PD) kernel from any CND distance. In contrast to CPD kernels, the resulting kernel can be used in any kernel-based algorithm, no matter whet... |

1 |
Stratified CV tends to reproduce the same class priors in the folds as in the complete dataset. Classification in the Jensen-Shannon Embedding
- Gower
- 1968
(Show Context)
Citation Context ...or a new embedded point ϕ(xn+1) at the respective squared distance d(xi, xn+1) 2 = δi from each training samples ϕ(xi). A similar problem occurs in the theory of MDS and has been studied by Gower in (=-=Gower, 1968-=-); this section is largely inspired by this work. This new point needs not be computed explicitly in the embedding but the dot product between the new point and the training set samples is computed fr... |

1 |
Jensen-shannon divergence and norm-based measures of discrimination and variation (Technical Report
- Topøe
- 2003
(Show Context)
Citation Context ...tive definite function is obtained by removing the condition ∑m i=1 ci = 0. Therefore, any negative definite function is also a CND function. It has recently been shown that the JS divergence is CND (=-=Topøe, 2003-=-). 3. Embedding of a CND distance into a Hilbert space In section 2, it was shown that the square of the JS divergence is a CND distance metric. Schölkopf (Schölkopf, 2000; Schölkopf & Smola, 2002) ha... |