Results 1  10
of
16
Divergence measures based on the Shannon entropy
 IEEE Transactions on Information theory
, 1991
"... AbstractA new class of informationtheoretic divergence measures based on the Shannon entropy is introduced. Unlike the wellknown Kullback divergences, the new measures do not require the condition of absolute continuity to be satisfied by the probability distributions involved. More importantly, ..."
Abstract

Cited by 619 (0 self)
 Add to MetaCart
AbstractA new class of informationtheoretic divergence measures based on the Shannon entropy is introduced. Unlike the wellknown Kullback divergences, the new measures do not require the condition of absolute continuity to be satisfied by the probability distributions involved. More importantly, their close relationship with the variational distance and the probability of misclassification error are established in terms of bounds. These bounds are crucial in many applications of divergence measures. The new measures are also well characterized by the properties of nonnegativity, finiteness, semiboundedness, and boundedness. Index TermsDivergence, dissimilarity measure, discrimination information, entropy, probability of error bounds. I.
Learning from examples with Information Theoretic Criteria
 Journal of VLSI Systems, Kluwer
, 1999
"... This paper discusses a framework for learning based on information theoretic criteria. A novel algorithm based on Renyi’s quadratic entropy is used to train, directly from a data set, linear or nonlinear mappers for entropy maximization or minimization. We provide an intriguing analogy between the c ..."
Abstract

Cited by 30 (8 self)
 Add to MetaCart
(Show Context)
This paper discusses a framework for learning based on information theoretic criteria. A novel algorithm based on Renyi’s quadratic entropy is used to train, directly from a data set, linear or nonlinear mappers for entropy maximization or minimization. We provide an intriguing analogy between the computation and an information potential measuring the interactions among the data samples. We also propose two approximations to the KulbackLeibler divergence based on quadratic distances (CauchySchwartz inequality and Euclidean distance). These distances can still be computed using the information potential. We test the newly proposed distances in blind source separation (unsupervised learning) and in feature extraction for classification (supervised learning). In blind source separation our algorithm is capable of separating instantaneously mixed sources, and for classification the performance of our classifier is comparable to the support vector machines (SVMs). 1
On the Relationship between Classification Error Bounds and Training Criteria in Statistical Pattern Recognition
, 2003
"... We present two novel bounds for the classification error that, at the same time, can be used as practical training criteria. Unlike the bounds reported in the literature so far, these novel bounds are based on a strict distinction between the true but unknown distribution and the model distribution, ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
(Show Context)
We present two novel bounds for the classification error that, at the same time, can be used as practical training criteria. Unlike the bounds reported in the literature so far, these novel bounds are based on a strict distinction between the true but unknown distribution and the model distribution, which is used in the decision rule. The two bounds we derive are the squared distance and the KullbackLeibler distance, where in both cases the distance is computed between the true distribution and the model distribution. In terms of practical training criteria, these bounds result in the squared error criterion and the mutual information (or equivocation) criterion, respectivel . 1
Mixing and nonmixing local minima of the entropy contrast for blind source separation
 IEEE Transactions on Information Theory
, 2007
"... Abstract — In this paper, both nonmixing and mixing local minima of the entropy are analyzed from the viewpoint of blind source separation (BSS); they correspond respectively to acceptable and spurious solutions of the BSS problem. The contribution of this work is twofold. First, a Taylor developme ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract — In this paper, both nonmixing and mixing local minima of the entropy are analyzed from the viewpoint of blind source separation (BSS); they correspond respectively to acceptable and spurious solutions of the BSS problem. The contribution of this work is twofold. First, a Taylor development is used to show that the exact output entropy cost function has a nonmixing minimum when this output is proportional to any of the nonGaussian sources, and not only when the output is proportional to the lowest entropic source. Second, in order to prove that mixing entropy minima exist when the source densities are strongly multimodal, an entropy approximator is proposed. The latter has the major advantage that an error bound can be provided. Even if this approximator (and the associated bound) is used here in the BSS context, it can be applied for estimating the entropy of any random variable with multimodal density. Index Terms — Blind source separation. Independent component analysis. Entropy estimation. Multimodal densities. Mixture distribution. EDICS Category: I.
USING EXPONENTIAL MIXTURE MODELS FOR SUBOPTIMAL DISTRIBUTED DATA FUSION
"... In this paper we investigate the use of Exponential Mixture Densities (EMDs) as suboptimal update rules for distributed data fusion. We show that EMDs have a pointwise bound “from below ” on the minimum value of the probability distribution. However, the distributions are not bounded from above and ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
In this paper we investigate the use of Exponential Mixture Densities (EMDs) as suboptimal update rules for distributed data fusion. We show that EMDs have a pointwise bound “from below ” on the minimum value of the probability distribution. However, the distributions are not bounded from above and thus can be interpreted as a fusion operation. 1.
A Discriminative Splitting Criterion for Phonetic Decision Trees
"... Phonetic decision trees are a key concept in acoustic modeling for large vocabulary continuous speech recognition. Although discriminative training has become a major line of research in speech recognition and all stateoftheart acoustic models are trained discriminatively, the conventional phonet ..."
Abstract
 Add to MetaCart
(Show Context)
Phonetic decision trees are a key concept in acoustic modeling for large vocabulary continuous speech recognition. Although discriminative training has become a major line of research in speech recognition and all stateoftheart acoustic models are trained discriminatively, the conventional phonetic decision tree approach still relies on the maximum likelihood principle. In this paper we develop a splitting criterion based on the minimization of the classification error. An improvement of more than 10 % relative over a discriminatively trained baseline system on the Wall Street Journal corpus suggests that the proposed approach is promising. Index Terms: discriminative training, phonetic decision trees, state tying, new paradigms 1.
92 An entropybased learning algorithm of Bayesian conditional trees
"... This article offers a modification of Chow and Liu’s learning algorithm in the context of handwritten digit recognition. The modified algorithm directs the user to group digits into several classes consisting of digits that are hard to distinguish and then constructing an optimal conditional tree re ..."
Abstract
 Add to MetaCart
This article offers a modification of Chow and Liu’s learning algorithm in the context of handwritten digit recognition. The modified algorithm directs the user to group digits into several classes consisting of digits that are hard to distinguish and then constructing an optimal conditional tree representation for each class of digits instead of for each single digit as done by Chow and Liu (1968). Advantages and extensions of the new method are discussed. Related works of Wong and Wang (1977) and Wong and Poon (1989) which offer a different entropybased learning algorithm are shown to rest on inappropriate assumptions. 1
ESTIMATING COGNITIVE STATE USING EEG SIGNALS
"... Using EEG signals to estimate cognitive state has drawn increasing attention in recently years, especially in the context of braincomputer interface (BCI) design. However, this goal is extremely difficult because, in addition to the complex relationships between the cognitive state and EEG signals ..."
Abstract
 Add to MetaCart
(Show Context)
Using EEG signals to estimate cognitive state has drawn increasing attention in recently years, especially in the context of braincomputer interface (BCI) design. However, this goal is extremely difficult because, in addition to the complex relationships between the cognitive state and EEG signals that yields the nonstationarity of the features extracted from EEG signals, there are artefacts introduced by eye blinks and head and body motion. In this paper, we present a classification system, which can estimate the subject’s cognitive state from the measured EEG signals. In the proposed system, a mutual information based method is employed to reduce the dimensionality of the features as well as to increase the robustness of the system. A committee of three classifiers was implemented and the majority voting results of the committee are taken to be the final decisions. The results of a preliminary test with data from freely moving subjects performing various tasks as opposed to the strictly controlled experimental setups of BCI provide strong support for this approach. 1.
A Tighter Bhattacharyya Bound for Decoding Error Probability Miguel Griot, Student Member, IEEE, WenYen Weng, Student Member, IEEE,
"... Abstract — The Bhattacharyya bound has been widely used to upper bound the pairwise probability of error when transmitting over a noisy channel. However, the bound as it appears in most textbooks on channel coding can be improved by a factor of 1/2 when applied to the frame error probability. For t ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract — The Bhattacharyya bound has been widely used to upper bound the pairwise probability of error when transmitting over a noisy channel. However, the bound as it appears in most textbooks on channel coding can be improved by a factor of 1/2 when applied to the frame error probability. For the particular case of symmetric channels, the pairwise error probability can also be improved by a factor of 1/2. This letter provides a simple proof of these tighter bounds that has the same simplicity as the proof of the standard Bhattacharyya bound currently found in textbooks. Index Terms — Channel coding, Bhattacharyya bound, error probability.