Results 1  10
of
57
Clustering with Bregman Divergences
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergence ..."
Abstract

Cited by 310 (52 self)
 Add to MetaCart
A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergences. The proposed algorithms unify centroidbased parametric clustering approaches, such as classical kmeans and informationtheoretic clustering, which arise by special choices of the Bregman divergence. The algorithms maintain the simplicity and scalability of the classical kmeans algorithm, while generalizing the basic idea to a very large class of clustering loss functions. There are two main contributions in this paper. First, we pose the hard clustering problem in terms of minimizing the loss in Bregman information, a quantity motivated by ratedistortion theory, and present an algorithm to minimize this loss. Secondly, we show an explicit bijection between Bregman divergences and exponential families. The bijection enables the development of an alternative interpretation of an ecient EM scheme for learning models involving mixtures of exponential distributions. This leads to a simple soft clustering algorithm for all Bregman divergences.
Convergence of a block coordinate descent method for nondifferentiable minimization
 J. Optim Theory Appl
, 2001
"... Abstract. We study the convergence properties of a (block) coordinate descent method applied to minimize a nondifferentiable (nonconvex) function f(x1,...,xN) with certain separability and regularity properties. Assuming that f is continuous on a compact level set, the subsequence convergence of the ..."
Abstract

Cited by 115 (2 self)
 Add to MetaCart
Abstract. We study the convergence properties of a (block) coordinate descent method applied to minimize a nondifferentiable (nonconvex) function f(x1,...,xN) with certain separability and regularity properties. Assuming that f is continuous on a compact level set, the subsequence convergence of the iterates to a stationary point is shown when either f is pseudoconvex in every pair of coordinate blocks from among NA1 coordinate blocks or f has at most one minimum in each of NA2 coordinate blocks. If f is quasiconvex and hemivariate in every coordinate block, then the assumptions of continuity of f and compactness of the level set may be relaxed further. These results are applied to derive new (and old) convergence results for the proximal minimization algorithm, an algorithm of Arimoto and Blahut, and an algorithm of Han. They are applied also to a problem of blind source separation. Key Words. Block coordinate descent, nondifferentiable minimization, stationary point, Gauss–Seidel method, convergence, quasiconvex functions,
Simulationbased computation of information rates for channels with memory
 IEEE TRANS. INFORM. THEORY
, 2006
"... The information rate of finitestate source/channel models can be accurately estimated by sampling both a long channel input sequence and the corresponding channel output sequence, followed by a forward sum–product recursion on the joint source/channel trellis. This method is extended to compute up ..."
Abstract

Cited by 54 (11 self)
 Add to MetaCart
The information rate of finitestate source/channel models can be accurately estimated by sampling both a long channel input sequence and the corresponding channel output sequence, followed by a forward sum–product recursion on the joint source/channel trellis. This method is extended to compute upper and lower bounds on the information rate of very general channels with memory by means of finitestate approximations. Further upper and lower bounds can be computed by reducedstate methods.
APPROXIMATING THE KULLBACK LEIBLER DIVERGENCE BETWEEN GAUSSIAN MIXTURE MODELS
"... The Kullback Leibler (KL) Divergence is a widely used tool in statistics and pattern recognition. The KL divergence between two Gaussian Mixture Models (GMMs) is frequently needed in the fields of speech and image recognition. Unfortunately the KL divergence between two GMMs is not analytically trac ..."
Abstract

Cited by 46 (0 self)
 Add to MetaCart
The Kullback Leibler (KL) Divergence is a widely used tool in statistics and pattern recognition. The KL divergence between two Gaussian Mixture Models (GMMs) is frequently needed in the fields of speech and image recognition. Unfortunately the KL divergence between two GMMs is not analytically tractable, nor does any efficient computational algorithm exist. Some techniques cope with this problem by replacing the KL divergence with other functions that can be computed efficiently. We introduce two new methods, the variational approximation and the variational upper bound, and compare them to existing methods. We discuss seven different techniques in total and weigh the benefits of each one against the others. To conclude we evaluate the performance of each one through numerical experiments. Index Terms — Kullback Leibler divergence, variational methods, gaussian mixture models, unscented transformation.
Optimum power allocation for parallel Gaussian channels with arbitrary input distributions
 IEEE TRANS. INF. THEORY
, 2006
"... The mutual information of independent parallel Gaussiannoise channels is maximized, under an average power constraint, by independent Gaussian inputs whose power is allocated according to the waterfilling policy. In practice, discrete signaling constellations with limited peaktoaverage ratios (m ..."
Abstract

Cited by 35 (9 self)
 Add to MetaCart
The mutual information of independent parallel Gaussiannoise channels is maximized, under an average power constraint, by independent Gaussian inputs whose power is allocated according to the waterfilling policy. In practice, discrete signaling constellations with limited peaktoaverage ratios (mPSK, mQAM, etc.) are used in lieu of the ideal Gaussian signals. This paper gives the power allocation policy that maximizes the mutual information over parallel channels with arbitrary input distributions. Such policy admits a graphical interpretation, referred to as mercury/waterfilling, which generalizes the waterfilling solution and allows retaining some of its intuition. The relationship between mutual information of Gaussian channels and nonlinear minimum meansquare error (MMSE) proves key to solving the power allocation problem.
Informationtheoretic image formation
 IEEE Transactions on Information Theory
, 1998
"... Abstract — The emergent role of information theory in image formation is surveyed. Unlike the subject of informationtheoretic communication theory, informationtheoretic imaging is far from a mature subject. The possible role of information theory in problems of image formation is to provide a rigo ..."
Abstract

Cited by 28 (5 self)
 Add to MetaCart
Abstract — The emergent role of information theory in image formation is surveyed. Unlike the subject of informationtheoretic communication theory, informationtheoretic imaging is far from a mature subject. The possible role of information theory in problems of image formation is to provide a rigorous framework for defining the imaging problem, for defining measures of optimality used to form estimates of images, for addressing issues associated with the development of algorithms based on these optimality criteria, and for quantifying the quality of the approximations. The definition of the imaging problem consists of an appropriate model for the data and an appropriate model for the reproduction space, which is the space within which image estimates take values. Each problem statement has an associated optimality criterion that measures the overall quality of an estimate. The optimality criteria include maximizing the likelihood function and minimizing mean squared error for stochastic problems, and minimizing squared error and discrimination for deterministic problems. The development of algorithms is closely tied to the definition of the imaging problem and the associated optimality criterion. Algorithms with a strong informationtheoretic motivation are obtained by the method of expectation maximization. Related alternating minimization algorithms are discussed. In quantifying the quality of approximations, global and local measures are discussed. Global measures include the (mean) squared error and discrimination between an estimate and the truth, and probability of error for recognition or hypothesis testing problems. Local measures include Fisher information. Index Terms—Image analysis, image formation, image processing, image reconstruction, image restoration, imaging, inverse problems, maximumlikelihood estimation, pattern recognition. I.
Detecting covert timing channels: an entropybased approach
 ACM Conference on Computer and Communications Security
, 2007
"... The detection of covert timing channels is of increasing interest in light of recent practice on the exploitation of covert timing channels over the Internet. However, due to the high variation in legitimate network traffic, detecting covert timing channels is a challenging task. The existing detect ..."
Abstract

Cited by 28 (4 self)
 Add to MetaCart
The detection of covert timing channels is of increasing interest in light of recent practice on the exploitation of covert timing channels over the Internet. However, due to the high variation in legitimate network traffic, detecting covert timing channels is a challenging task. The existing detection schemes are ineffective to detect most of the covert timing channels known to the security community. In this paper, we introduce a new entropybased approach to detecting various covert timing channels. Our new approach is based on the observation that the creation of a covert timing channel has certain effects on the entropy of the original process, and hence, a change in the entropy of a process provides a critical clue for covert timing channel detection. Exploiting this observation, we investigate the use of entropy and conditional entropy in detecting covert timing channels. Our experimental results show that our entropybased approach is sensitive to the current covert timing channels, and is capable of detecting them in an accurate manner.
Natural Type Selection in Adaptive Lossy Compression
, 2000
"... Consider approximate (lossy) matching of a source string P , with a random codebook generated from reproduction distribution Q, at a specified distortion d. Recent work determined the minimum coding rate, R 1 = R(P; Q; d), for this setting. We observe that for large word length and with high pro ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
Consider approximate (lossy) matching of a source string P , with a random codebook generated from reproduction distribution Q, at a specified distortion d. Recent work determined the minimum coding rate, R 1 = R(P; Q; d), for this setting. We observe that for large word length and with high probability, the matching codeword is typical with a distribution Q 1 which is different from Q. If a new random codebook is generated Q 1 , then the source string will favor code words which are typical with a new distribution Q 2 , resulting in minimum coding rate R 2 = R(P; Q 1 ; d), and so on. We show that the sequences of distributions Q 1 ; Q 2 : : : and rates R 1 ; R 2 ; : : :, generated by this procedure, converge to an optimum reproduction distribution Q , and the ratedistortion function R(P; d), respectively. We also derive a fixed ratedistortion slope version of this natural type selection process. In the latter case, an iteration of the process stochastically simulates a...
Metabolically efficient information processing
 Neural Comput
, 2001
"... Energy efficient information transmission may be relevant to biological sensory signal processing as well as to low power electronic devices. We explore its consequences in two different regimes. In an “immediate ” regime, we argue that the information rate should be maximized subject to a power con ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
Energy efficient information transmission may be relevant to biological sensory signal processing as well as to low power electronic devices. We explore its consequences in two different regimes. In an “immediate ” regime, we argue that the information rate should be maximized subject to a power constraint, while in an “exploratory ” regime, the transmission rate per power cost should be maximized. In the absence of noise, discrete inputs are optimally encoded into Boltzmann distributed output symbols. In the exploratory regime, the partition function of this distribution is numerically equal to 1. The structure of the optimal code is strongly affected by noise in the transmission channel. The ArimotoBlahut algorithm, generalized for cost constraints, can be used to derive and interpret the distribution of symbols for optimal energy efficient coding in the presence of noise. We outline the possibilities and problems in extending our results to information coding and transmission in neurobiological systems. 1 Introduction: The Utility of
Computation of Information Rates from FiniteState Source/Channel Models
 Proc. 40th Annual Allerton Conference on Communication, Control, and Computing, (Allerton
, 2002
"... It has recently become feasible to compute information rates of finitestate source/channel models with not too many states. We review such methods and demonstrate their extension to compute upper and lower bounds on the information rate of very general (nonfinitestate) channels by means of fin ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
It has recently become feasible to compute information rates of finitestate source/channel models with not too many states. We review such methods and demonstrate their extension to compute upper and lower bounds on the information rate of very general (nonfinitestate) channels by means of finitestate approximations.