## Nonextensive Entropic Kernels (2008)

### Cached

### Download Links

- [www.cs.cmu.edu]
- [www.cs.cmu.edu]
- [www.cs.cmu.edu]
- [icml2008.cs.helsinki.fi]
- [omni.isr.ist.utl.pt]
- [www.lx.it.pt]
- [www.cs.cmu.edu]
- [www.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [www.sailing.cs.cmu.edu]

Citations: | 5 - 1 self |

### BibTeX

@MISC{Martins08nonextensiveentropic,

author = {André F. T. Martins and Mário A. T. Figueiredo and Pedro M. Q. Aguiar and Noah A. Smith and Eric P. Xing},

title = {Nonextensive Entropic Kernels},

year = {2008}

}

### OpenURL

### Abstract

Positive definite kernels on probability measures have been recently applied in structured data classification problems. Some of these kernels are related to classic information theoretic quantities, such as mutual information and the Jensen-Shannon divergence. Meanwhile, driven by recent advances in Tsallis statistics, nonextensive generalizations of Shannon’s information theory have been proposed. This paper bridges these two trends. We introduce the Jensen-Tsallis q-difference, a generalization of the Jensen-Shannon divergence. We then define a new family of nonextensive mutual information kernels, which allow weights to be assigned to their arguments, and which includes the Boolean, Jensen-Shannon, and linear kernels as particular cases. We illustrate the performance of these kernels on text categorization tasks.

### Citations

8650 |
Elements of Information Theory
- Cover, Thomas
- 1991
(Show Context)
Citation Context ...finite set X = {x1, . . . , xn} according to a probability distribution PX. An entropy function is said to be extensive if it is additive over independent variables. For example, the Shannon entropy (=-=Cover & Thomas, 1991-=-), H(X) � −E[ln PX], is extensive: if X and Y are independent, then H(X, Y ) = H(X)+H(Y ). Another example is the family of Rényi entropies (Rényi, 1961), parameterized by q ≥ 0, Rq(X) � 1 1 − q ln n∑... |

6163 |
The Mathematical Theory of Communication
- Shannon, Weaver
- 1949
(Show Context)
Citation Context ...ls, and ∆ n−1 � { (x1, . . . , xn) ∈ R n | n∑ } xi = 1, ∀i xi ≥ 0 denotes the (n − 1)-dimensional simplex. Inspired by the Shannon-Khinchin axiomatic formulation of Shannon’s entropy [Khinchin, 1957, =-=Shannon and Weaver, 1949-=-], Suyari [2004] proposed an axiomatic framework for nonextensive entropies and a uniqueness theorem. Let q ≥ 0 be a fixed scalar, called the entropic index, and let fq be a function defined on ∆ n−1 ... |

2053 | Learning with Kernels
- Schölkopf, Smola
- 2002
(Show Context)
Citation Context ...ence and the JT qdifference, which allow weighting their arguments. In this section, m = 2 (kernels involve pairs of measures). 5.1. Background on kernels We begin with some basic results on kernels (=-=Schölkopf & Smola, 2002-=-). Below, X denotes a nonempty set; R+ denote the nonnegative reals, and R++ � R+ \ {0}. Definition 2 Let ϕ : X × X → R be a symmetric function, i.e., ϕ(y, x) = ϕ(x, y), for all x, y ∈ X . ϕ is called... |

1717 | Text categorization with support vector machines: Learning with many relevant features
- Joachims
- 1998
(Show Context)
Citation Context ...ons. In fact, approaches that map data to a statistical manifold, where well-motivated non-Euclidean metrics may be defined (Lafferty & Lebanon, 2005), outperform SVM classifiers with linear kernels (=-=Joachims, 1997-=-). Some of these kernels have a natural information theoretic interpretation, creating a bridge between kernel methods and information theory (Cuturi et al., 2005; Hein & Bousquet, 2005). We reinforce... |

902 |
Algorithms on Strings, Trees and Sequences
- Gusfield
- 1997
(Show Context)
Citation Context ...computed in O(|s| + |t|) time (i.e., with cost that is linear in the length of the strings), as shown by Vishwanathan and Smola [2003], by using data structures such as suffix trees or suffix arrays [=-=Gusfield, 1997-=-]. Moreover, with s fixed, any kernel k(s, t) may be computed in time O(|t|), which is particularly useful for classification applications. We will now see how Jensen-Tsallis kernels may be used as st... |

798 |
Kernel methods for Pattern Analysis
- Shawe-Taylor, Cristianini
- 2004
(Show Context)
Citation Context ... kernels on text categorization tasks, in which documents are modeled both as bags-of-words and as sequences of characters.s1 Introduction In kernel-based machine learning [Schölkopf and Smola, 2002, =-=Shawe-Taylor and Cristianini, 2004-=-], there has been recent interest in defining kernels on probability distributions, to tackle several problems involving structured data [Desobry et al., 2007, Moreno et al., 2004, Jebara et al., 2004... |

418 | Divergence Measures Based on the Shannon Entropy
- Lin
- 1991
(Show Context)
Citation Context ...s. Letting Ψ in (8) be H, the Shannon entropy, the resulting Jensen difference J π H (p1, . . . , pm) is known as the JS divergence of p1, . . . , pm, with weights π1, . . . , πm (Burbea & Rao, 1982; =-=Lin, 1991-=-). In this instance of the Jensen difference, J π H(p1, . . . , pm) = I(X; Y ), (9) where I(X; Y ) = H(X) − H(X|Y ) is the MI between X and Y (Banerjee et al., 2005). For m = 2 and π = ( 1 J 1 1 ( 2 ,... |

391 |
Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms
- Joachims
- 2002
(Show Context)
Citation Context ...that map data to statistical manifolds, equipped with well-motivated non-Euclidean metrics [Lafferty and Lebanon, 2005], often outperform support vector machine (SVM) classifiers with linear kernels [=-=Joachims, 2002-=-]. Some of these kernels have a natural information theoretic interpretation, establishing a bridge between kernel methods and information theory [Cuturi et al., 2005, Hein and Bousquet, 2005]. The ma... |

388 | Text classification using string kernels - Lodhi, Shawe-Taylor, et al. - 2000 |

373 | Convolution Kernels on Discrete Structures
- Haussler
- 1999
(Show Context)
Citation Context ... them as “string kernels,” they are more generally kernels between stochastic processes. Several string kernels (i.e., kernels operating on the space of strings) have been proposed in the literature [=-=Haussler, 1999-=-, Lodhi et al., 2002, Leslie et al., 2002, Vishwanathan and Smola, 2003, Shawe-Taylor and Cristianini, 2004]. These are kernels defined on A ∗ × A ∗ , where A ∗ is the Kleene closure of a finite alpha... |

340 |
On measures of entropy and information
- Renyi
- 1970
(Show Context)
Citation Context ...on theory (Cuturi et al., 2005; Hein & Bousquet, 2005). We reinforce that bridge by introducing a new class of kernels rooted in nonextensive (NE) information theory. The Shannon and Rényi entropies (=-=Rényi, 1961-=-) share the extensivity property: the joint entropy of a pair of independent random variables equals the sum of the individual entropies. Abandoning this property yields the so-called NE entropies (Ha... |

311 | Clustering with Bregman divergences - Banerjee, Merugu, et al. - 2005 |

285 |
Possible generalization of Boltzmann–Gibbs statistics
- Tsallis
- 1988
(Show Context)
Citation Context ...y property: the joint entropy of a pair of independent random variables equals the sum of the individual entropies. Abandoning this property yields the so-called NE entropies (Havrda & Charvát, 1967; =-=Tsallis, 1988-=-), which have raised great interest among physicists in modeling certain phenomena (e.g., long-range interactions and multifractals) and as generalizations of Boltzmann-Gibbs statistical mechanics (Ab... |

277 |
Methods of Information Geometry
- Amari, Nagaoka
- 2000
(Show Context)
Citation Context ...two well-known string kernels. 7.6 The heat kernel approximation The diffusion kernel for statistical manifolds, recently proposed by Lafferty and Lebanon [2005], is grounded in information geometry [=-=Amari and Nagaoka, 2001-=-]. It models the diffusion of “information” over a statistical manifold according to the heat equation. Since in the case of the multinomial manifold (the relative interior of ∆ n ), the diffusion ker... |

259 |
I-divergence geometry of probability distributions and minimization problems, The Ann. Probab. 3
- Csiszár
- 1975
(Show Context)
Citation Context ...where k > 0 is a constant, the function ϕH : R++ → R is defined as ϕH(y) = −k y ln y, (12) and, as usual, 0 ln 0 � 0. The generalized form of the KL divergence, often called generalized I-divergence [=-=Csiszar, 1975-=-], is a directed divergence between two measures µf, µg ∈ M H + (X ), such that µf is µgabsolutely continuous (denoted µf ≪ µg). Let f and g be the densities associated with µf and µg, respectively. I... |

148 |
Mathematical foundations of information theory
- Khinchin
- 1957
(Show Context)
Citation Context ...X) � 1 1 − q ln n∑ PX(xi) q , (1) i=1 which includes Shannon’s entropy as a special case when q → 1. In classic information theory, extensivity is considered desirable, and is enforced axiomatically (=-=Khinchin, 1957-=-), to express the idea borrowed from thermodynamics that “independent systems add their entropies.” In contrast, the Tsallis entropies abandon the extensivity requirement (Tsallis, 1988). These NE ent... |

133 |
Harmonic analysis on semigroups
- Berg, Christensen, et al.
- 1984
(Show Context)
Citation Context ...ollowing proposition, proved by Berg et al. (1984), will also be used below. 4 A function f : X → R is called pd (resp. nd) if k : X × X → R, defined as k(x, y) = f(x + y), is a pd (resp. nd) kernel (=-=Berg et al., 1984-=-). Proposition 9 The function ζq : R++ → R, defined as ζq(y) = y −q is pd, for q ∈ [0, 1]. We now present the main contribution of this section, the family of weighted JT kernels, generalizing the JS ... |

105 | Probability product kernels
- Jebara, Kondor, et al.
(Show Context)
Citation Context ... of these kernels on text categorization tasks. 1. Introduction There has been recent interest in kernels on probability distributions, to tackle several classification problems (Moreno et al., 2003; =-=Jebara et al., 2004-=-; Hein & Bousquet, 2005; Lafferty & Lebanon, 2005; Cuturi et al., 2005). By mapping data points to fitted distributions in a parametric family where a kernel is defined, a kernel is automatically indu... |

94 | A Kullback-Leibler divergence based kernel for SVM classification in multimedia applications
- Moreno, Ho, et al.
(Show Context)
Citation Context ...trate the performance of these kernels on text categorization tasks. 1. Introduction There has been recent interest in kernels on probability distributions, to tackle several classification problems (=-=Moreno et al., 2003-=-; Jebara et al., 2004; Hein & Bousquet, 2005; Lafferty & Lebanon, 2005; Cuturi et al., 2005). By mapping data points to fitted distributions in a parametric family where a kernel is defined, a kernel ... |

87 | Diffusion kernels on statistical manifolds
- Lafferty, Lebanon
- 2005
(Show Context)
Citation Context ...ks. 1. Introduction There has been recent interest in kernels on probability distributions, to tackle several classification problems (Moreno et al., 2003; Jebara et al., 2004; Hein & Bousquet, 2005; =-=Lafferty & Lebanon, 2005-=-; Cuturi et al., 2005). By mapping data points to fitted distributions in a parametric family where a kernel is defined, a kernel is automatically induced on the original input space. In text categori... |

84 | Quantification method of classification process, concept of structural α-entorpy, Kybernetika 3
- Havrda, Charvát
- 1967
(Show Context)
Citation Context ...61) share the extensivity property: the joint entropy of a pair of independent random variables equals the sum of the individual entropies. Abandoning this property yields the so-called NE entropies (=-=Havrda & Charvát, 1967-=-; Tsallis, 1988), which have raised great interest among physicists in modeling certain phenomena (e.g., long-range interactions and multifractals) and as generalizations of Boltzmann-Gibbs statistica... |

82 | Fast kernels for string and tree matching
- Viswanathan, Smola
- 2003
(Show Context)
Citation Context ...s between stochastic processes. Several string kernels (i.e., kernels operating on the space of strings) have been proposed in the literature [Haussler, 1999, Lodhi et al., 2002, Leslie et al., 2002, =-=Vishwanathan and Smola, 2003-=-, Shawe-Taylor and Cristianini, 2004]. These are kernels defined on A ∗ × A ∗ , where A ∗ is the Kleene closure of a finite alphabet A (i.e., the set of all finite strings formed by characters in A to... |

72 |
Some inequalities for information divergence and related measures of discrimination
- TOPSOE
- 2000
(Show Context)
Citation Context ..., 2 ), we denote the ensuing JS(p1, p2) = H((p1 + p2)/2) − (H(p1) + H(p2))/2. It can be shown that that √ JS satisfies the triangle inequality and is a Hilbertian metric 2 (Endres & Schindelin, 2003; =-=Topsøe, 2000-=-), which has motivated its use in kernel-based machine learning. 2 A metric d : X × X → R is Hilbertian if there is some Hilbert space H and an isometry f : X → H such that d 2 (x, y) = 〈f(x)−f(y), f(... |

68 |
Nonextensive Entropy: Interdisciplinary Applications
- Gell-Mann, Tsallis
- 2004
(Show Context)
Citation Context ...and multifractals) and as generalizations of Boltzmann-Gibbs statistical mechanics (Abe, 2006). NE entropies have also been recently used in signal/image processing (Li et al., 2006) and other areas (=-=Gell-Mann & Tsallis, 2004-=-). The main contributions of this paper are: • Based on the new concept of q-convexity and a related q-Jensen inequality, we introduce the Jensen-Tsallis q-difference, a NE generalization of the Jense... |

62 |
Sur les fonctions convexes et les inégalités entre les valeurs moyennes
- Jensen
- 1906
(Show Context)
Citation Context ...n information theory, e.g., the non-negativity of the Kullback-Leibler (KL) divergence (also called relative entropy), namely via the many implications of Jensen’s inequality [Cover and Thomas, 1991, =-=Jensen, 1906-=-]. Jensen’s inequality also underlies the concept of Jensen-Shannon (JS) divergence, which is a symmetrized and smoothed version of the KL divergence [Lin and Wong, 1990, Lin, 1991]. The JS divergence... |

57 |
On the convexity of some divergence measures based on entropy functions
- Burbea, Rao
- 1982
(Show Context)
Citation Context ...itional distributions. Letting Ψ in (8) be H, the Shannon entropy, the resulting Jensen difference J π H (p1, . . . , pm) is known as the JS divergence of p1, . . . , pm, with weights π1, . . . , πm (=-=Burbea & Rao, 1982-=-; Lin, 1991). In this instance of the Jensen difference, J π H(p1, . . . , pm) = I(X; Y ), (9) where I(X; Y ) = H(X) − H(X|Y ) is the MI between X and Y (Banerjee et al., 2005). For m = 2 and π = ( 1 ... |

51 | Hilbertian metrics and positive definite kernels on probability measures
- Hein, Bousquet
- 2005
(Show Context)
Citation Context ...text categorization tasks. 1. Introduction There has been recent interest in kernels on probability distributions, to tackle several classification problems (Moreno et al., 2003; Jebara et al., 2004; =-=Hein & Bousquet, 2005-=-; Lafferty & Lebanon, 2005; Cuturi et al., 2005). By mapping data points to fitted distributions in a parametric family where a kernel is defined, a kernel is automatically induced on the original inp... |

45 |
A new metric for probability distributions
- Endres, Schindelin
- 2003
(Show Context)
Citation Context ...p1, p2) as JS(p1, p2): 2 1 , 2 ), we denote the ensuing JS(p1, p2) = H((p1 + p2)/2) − (H(p1) + H(p2))/2. It can be shown that that √ JS satisfies the triangle inequality and is a Hilbertian metric 2 (=-=Endres & Schindelin, 2003-=-; Topsøe, 2000), which has motivated its use in kernel-based machine learning. 2 A metric d : X × X → R is Hilbertian if there is some Hilbert space H and an isometry f : X → H such that d 2 (x, y) = ... |

40 | Analysis of symbolic sequences using the jensen-shannon divergence measure, Phys - Grosse, Bernaola-Galvan, et al. |

39 |
Generalized information functions
- DARÓCZY
- 1970
(Show Context)
Citation Context ...1, . . . , pn) = −k n∑ pi ln pi, (3) and pseudoadditivity turns into additivity, i.e., H(A ⊗ B) = H(A) + H(B) holds. Several proposals for φ have appeared in the literature [Havrda and Charvát, 1967, =-=Daróczy, 1970-=-, Tsallis, 1988]. In the sequel, unless stated otherwise, we set φ(q) = q − 1, which yields the Tsallis entropy: Sq(p1, . . . , pn) = k q − 1 ( 1 − n∑ i=1 i=1 p q ) i . (4) To simplify, we let k = 1 a... |

32 | Agnostic classification of markovian sequences - El-Yaniv, Fine, et al. - 1997 |

22 |
Semigroup kernels on measures
- Cuturi, Fukumizu, et al.
- 2005
(Show Context)
Citation Context ...has been recent interest in kernels on probability distributions, to tackle several classification problems (Moreno et al., 2003; Jebara et al., 2004; Hein & Bousquet, 2005; Lafferty & Lebanon, 2005; =-=Cuturi et al., 2005-=-). By mapping data points to fitted distributions in a parametric family where a kernel is defined, a kernel is automatically induced on the original input space. In text categorization, this appears ... |

18 | Generalization of Shannon-Khinchin axioms to nonextensive systems and the uniqueness theorem for the nonextensive entropy - Suyari - 2004 |

15 |
The Cauchy-Schwarz Master Class
- Steele
- 2004
(Show Context)
Citation Context ...d only if ϕq is convex and −1/ϕ ′′ q is (2 − q)-convex. Since ϕ ′′ q(x) = q xq−2 , ϕq is convex for x ≥ 0 and q ≥ 0. To show the (2 − q)-convexity 16of −1/ϕ ′′ q(x) = −(1/q)x2−q , for xt inequality [=-=Steele, 2006-=-], ≥ 0, and q ∈ [0, 1], we use a version of the power mean ( )2−q l∑ l∑ − λi xi ≤ − (λi xi) i=1 i=1 2−q l∑ = − λ i=1 2−q i thus concluding that −1/ϕ ′′ q is in fact (2 − q)-convex. x 2−q i , The next ... |

13 |
A new directed divergence measure and its characterization
- Lin, Wong
- 1990
(Show Context)
Citation Context ... inequality [Cover and Thomas, 1991, Jensen, 1906]. Jensen’s inequality also underlies the concept of Jensen-Shannon (JS) divergence, which is a symmetrized and smoothed version of the KL divergence [=-=Lin and Wong, 1990-=-, Lin, 1991]. The JS divergence is widely used in areas such as statistics, machine learning, image and signal processing, and physics. In this paper, we introduce new extensions of JS-type divergence... |

12 |
Semigroup kernels on finite sets
- Cuturi, Vert
- 2004
(Show Context)
Citation Context ...: M+(X ) → R, let the set M G + (X ) � {f ∈ M+(X ) : |G(f)| < ∞} be its effective domain, and M 1,G + (X ) � M G + (X ) ∩ M 1 +(X ) be its subdomain of probability measures. The following functional [=-=Cuturi and Vert, 2005-=-], extends the Shannon-Boltzmann-Gibbs entropy from M 1,H + to the unnormalized measures in M H + : ∫ H(f) = −k ∫ f ln f = ϕH ◦ f, (11) 5where k > 0 is a constant, the function ϕH : R++ → R is define... |

10 |
Foundations of nonextensive statistical mechanics
- Abe
- 2006
(Show Context)
Citation Context ...icists in modeling certain phenomena (e.g., long-range interactions and multifractals) and in the construction of a nonextensive generalization of the classical Boltzmann-Gibbs statistical mechanics [=-=Abe, 2006-=-]. Nonextensive entropies have also been recently used in signal/image processing [Li et al., 2006] and many other areas [GellMann and Tsallis, 2004]. The so-called Tsallis entropies [Havrda and Charv... |

9 | H.: Image Registration and Segmentation by Maximizing the Jensen-Rényi Divergence
- Hamza, Krim
- 2003
(Show Context)
Citation Context ..., p2) = Rq ( ) p1 + p2 2 − Rq(p1) + Rq(p2) . (36) 2 The JR divergence has been used in several signal/image processing applications, such as registration, segmentation, denoising, and classification [=-=Ben-Hamza and Krim, 2003-=-, He et al., 2003, Karakos et al., 2007]. In Section 7, we show that the JR divergence is (like the JS divergence) an Hilbertian metric, which is relevant for its use in kernel-based machine learning.... |

9 | Spirals in Hilbert space: with an application in information theory - Fuglede |

7 |
Information theoretical properties of Tsallis entropies
- Furuichi
- 2006
(Show Context)
Citation Context ... called entropic index. While statistical physics has been the main application of Tsallis entropies, some attempts have been made to produce NE generalizations of classic information theory results (=-=Furuichi, 2006-=-). As for the Shannon entropy, the Tsallis joint and conditional entropies are defined as Sq(X, Y ) � −Eq[lnq PXY ] and Sq(X|Y ) � −Eq[lnq PX|Y ], respectively, and follow a chain rule Sq(X, Y ) = Sq(... |

7 |
A nonextensive information-theoretic measure for image edge detection
- Hamza
- 2006
(Show Context)
Citation Context ...JRq is also an Hilbertian metric. Jensen-Tsallis (JT) Divergence Divergences of the form (8), with Ψ = Sq, are known as JT divergences (Burbea & Rao, 1982) and were recently used in image processing (=-=Hamza, 2006-=-). Unlike the JS divergence, the JT divergence lacks a MI interpretation; in Sec. 4, we introduce an alternative to the JT divergence, which is interpretable as a NE MI in the sense of Furuichi (2006)... |

7 | Hilbertian metrics on probability measures and their application in svms
- Hein, Lal, et al.
- 2004
(Show Context)
Citation Context ...ing heat kernel approximation is n − k heat(p1, p2) = (4πτ) 2 exp ( − 1 4t d2 ) g(p1, p2) , (28) where τ > 0 and dg(p1, p2) = 2 arccos (∑ √ ) i p1ip2i . Whether k heat is pd has been an open problem (=-=Hein et al., 2004-=-; Zhang et al., 2005). Proposition 11 Let n ≥ 2. For sufficiently large τ, the kernel kheat is not pd. Proof: From Prop. 4, kheat is pd, for all τ > 0, if and only if d 2 g is nd. We provide a counter... |

6 | Iterative denoising using Jensen-Rényi divergences with an application to unsupervised document categorization
- Karakos, Khudanpur, et al.
- 2007
(Show Context)
Citation Context ...nce. When m = 2 and Rq π = (1/2, 1/2), we write J π Rq (p) = JRq(p1, p2), where ( ) p1 + p2 JRq(p1, p2) = Rq − 2 Rq(p1) + Rq(p2) . 2 The JR divergence has been used in signal processing applications (=-=Karakos et al., 2007-=-). We show in Sect. 5.3 that √ JRq is also an Hilbertian metric. Jensen-Tsallis (JT) Divergence Divergences of the form (8), with Ψ = Sq, are known as JT divergences (Burbea & Rao, 1982) and were rece... |

6 |
Image segmentation based on Tsallisentropy and Renyi-entropy and their comparison
- Li, Fan, et al.
(Show Context)
Citation Context ...na (e.g., long-range interactions and multifractals) and as generalizations of Boltzmann-Gibbs statistical mechanics (Abe, 2006). NE entropies have also been recently used in signal/image processing (=-=Li et al., 2006-=-) and other areas (Gell-Mann & Tsallis, 2004). The main contributions of this paper are: • Based on the new concept of q-convexity and a related q-Jensen inequality, we introduce the Jensen-Tsallis q-... |

5 | Non-logarithmic Jensen-Shannon divergence - Lamberti, Majtey - 2003 |

3 | Density kernels on unordered sets for kernel-based signal processing
- Desobry, Davy, et al.
- 2007
(Show Context)
Citation Context ...[Schölkopf and Smola, 2002, Shawe-Taylor and Cristianini, 2004], there has been recent interest in defining kernels on probability distributions, to tackle several problems involving structured data [=-=Desobry et al., 2007-=-, Moreno et al., 2004, Jebara et al., 2004, Hein and Bousquet, 2005, Lafferty and Lebanon, 2005, Cuturi et al., 2005]. By defining a parametric family S containing the distributions from which the dat... |

2 |
A nonextensive information-theoretic measure for image edge detection
- Ben-Hamza
- 2006
(Show Context)
Citation Context ... q p2 1−q ) , (38) there is no counterpart of the equality (29). When X and T are finite, J π Sq in (37) is called the Jensen-Tsallis (JT) divergence and it has also been applied in image processing [=-=Ben-Hamza, 2006-=-]. Unlike the JS divergence, the JT divergence lacks an interpretation as a mutual information. Despite this, for q ∈ [1, 2], it exhibits joint convexity [Burbea and Rao, 1982]. In the next section, w... |

2 |
On the Theory of Measurement and its Consequences in Statistical Dynamics
- Lindhard
- 1974
(Show Context)
Citation Context ...the joint entropy of a pair of independent random variables equals the sum of the individual entropies. Abandoning this property yields the so-called nonextensive entropies [Havrda and Charvát, 1967, =-=Lindhard, 1974-=-, Lindhard and Nielsen, 1971, Tsallis, 1988], which have raised great interest among physicists in modeling certain phenomena (e.g., long-range interactions and multifractals) and in the construction ... |

2 | Tsallis kernels on measures - Martins, Aguiar, et al. |

1 |
Foundations of nonextensive statistical mechanics. Chaos, Nonlinearity, Complexity
- Abe
- 2006
(Show Context)
Citation Context ...88), which have raised great interest among physicists in modeling certain phenomena (e.g., long-range interactions and multifractals) and as generalizations of Boltzmann-Gibbs statistical mechanics (=-=Abe, 2006-=-). NE entropies have also been recently used in signal/image processing (Li et al., 2006) and other areas (Gell-Mann & Tsallis, 2004). The main contributions of this paper are: • Based on the new conc... |