Results 1  10
of
19
Clustering with Bregman Divergences
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergence ..."
Abstract

Cited by 377 (55 self)
 Add to MetaCart
(Show Context)
A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergences. The proposed algorithms unify centroidbased parametric clustering approaches, such as classical kmeans and informationtheoretic clustering, which arise by special choices of the Bregman divergence. The algorithms maintain the simplicity and scalability of the classical kmeans algorithm, while generalizing the basic idea to a very large class of clustering loss functions. There are two main contributions in this paper. First, we pose the hard clustering problem in terms of minimizing the loss in Bregman information, a quantity motivated by ratedistortion theory, and present an algorithm to minimize this loss. Secondly, we show an explicit bijection between Bregman divergences and exponential families. The bijection enables the development of an alternative interpretation of an ecient EM scheme for learning models involving mixtures of exponential distributions. This leads to a simple soft clustering algorithm for all Bregman divergences.
Natural Type Selection in Adaptive Lossy Compression
, 2000
"... Consider approximate (lossy) matching of a source string P , with a random codebook generated from reproduction distribution Q, at a specified distortion d. Recent work determined the minimum coding rate, R 1 = R(P; Q; d), for this setting. We observe that for large word length and with high pro ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
(Show Context)
Consider approximate (lossy) matching of a source string P , with a random codebook generated from reproduction distribution Q, at a specified distortion d. Recent work determined the minimum coding rate, R 1 = R(P; Q; d), for this setting. We observe that for large word length and with high probability, the matching codeword is typical with a distribution Q 1 which is different from Q. If a new random codebook is generated Q 1 , then the source string will favor code words which are typical with a new distribution Q 2 , resulting in minimum coding rate R 2 = R(P; Q 1 ; d), and so on. We show that the sequences of distributions Q 1 ; Q 2 : : : and rates R 1 ; R 2 ; : : :, generated by this procedure, converge to an optimum reproduction distribution Q , and the ratedistortion function R(P; d), respectively. We also derive a fixed ratedistortion slope version of this natural type selection process. In the latter case, an iteration of the process stochastically simulates a...
On approximating the rate regions for lossy source coding with coded and uncoded side information
 in Information Theory, 2008. ISIT 2008. IEEE International Symposium on
, 2008
"... Abstract — We derive new algorithms for approximating the rate regions for a family of source coding problems that includes lossy source coding, lossy source coding with uncoded side information at the receiver (the WynerZiv problem), and an achievability bound for lossy source coding with coded si ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Abstract — We derive new algorithms for approximating the rate regions for a family of source coding problems that includes lossy source coding, lossy source coding with uncoded side information at the receiver (the WynerZiv problem), and an achievability bound for lossy source coding with coded side information at the receiver. The new algorithms generalize a recent approximation algorithm by Gu and Effros from lossless to lossy coding. In each case, prior information theoretic descriptions of the desired regions are available but difficult to evaluate for example sources due to their dependence on auxiliary random variables. Our algorithm builds a linear program whose solution is no less than the desired lower bound and no greater than times that optimal value. These guarantees are met even when the optimal value is unknown. Here � is a parameter chosen by the user; the algorithmic complexity grows as as approaches 0, where � � is a constant that depends on the source coding problem and the alphabet sizes of the sources. I.
Computation and analysis of the Nlayer scalable ratedistortion function
 IEEE Trans. Inf. Theory
, 2003
"... Abstract—Methods for determining and computing the ratedistortion (RD) bound forlayer scalable source coding of a finite memoryless source are considered. Optimality conditions were previously derived for two layers in terms of the reproduction distributions and. However, the ignored and seemingly ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Abstract—Methods for determining and computing the ratedistortion (RD) bound forlayer scalable source coding of a finite memoryless source are considered. Optimality conditions were previously derived for two layers in terms of the reproduction distributions and. However, the ignored and seemingly insignificant boundary cases, where = 0 and is undefined, have major implications on the solution and its practical application. We demonstrate that, once the gap is filled and the result is extended tolayers, it is, in general, impractical to validate a tentative solution, as one has to verify the conditions for all conceivable...... at each ( 1...) such that... =0. As an alternative computational approach, we propose an iterative algorithm that converges to the optimal joint reproduction distribution..., if initialized with... 0 everywhere. For nonscalable coding ( = 1), the algorithm specializes to the Blahut–Arimoto algorithm. The algorithm may be used to directly compute the RD bound, or as an optimality testing procedure by applying it to a perturbed tentative solution. We address two additional difficulties due to the higher dimensionality of the RD surface in the scalable ( 1) case, namely, identifying the sufficient set of Lagrangian parameters to span the entire RD bound; and the problem of efficient navigation on the RD surface to compute a particular RD point. Index Terms—Alternating minimization, Kuhn–Tucker optimality conditions, rate distortion (RD), scalable source coding, successive refinement. I.
Estimation of the ratedistortion function
 2007. [Online]. Available: http://arxiv.org/abs/cs/0702018v1
"... Motivated by questions in lossy data compression and by theoretical considerations, this paper examines the problem of estimating the ratedistortion function of an unknown (not necessarily discretevalued) source from empirical data. The main focus is the behavior of the socalled “plugin ” estimat ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Motivated by questions in lossy data compression and by theoretical considerations, this paper examines the problem of estimating the ratedistortion function of an unknown (not necessarily discretevalued) source from empirical data. The main focus is the behavior of the socalled “plugin ” estimator, which is simply the ratedistortion function of the empirical distribution of the observed data. Sufficient conditions are given for its consistency, and examples are provided to demonstrate that in certain cases it fails to converge to the true ratedistortion function. The analysis of the performance of the plugin estimator is somewhat surprisingly intricate, even for stationary memoryless sources; the underlying mathematical problem is closely related to the classical problem of establishing the consistency of the maximum likelihood estimator in a parametric family. General consistency results are given for the plugin estimator applied to a broad class of sources, including all stationary and ergodic ones. A more general class of estimation problems is also considered, arising in the context of lossy data compression when the allowed class of coding distributions is restricted; analogous results are developed for the plugin estimator in that case. Finally, consistency theorems are formulated for modified (e.g., penalized) versions of the plugin estimator, and for estimating the optimal reproduction distribution.
A Minimally Informative Likelihood for Decision Analysis: Robustness and Illustration
 Canadian Journal Statistics
, 1999
"... Here we use a class of likelihoods which makes weak assumptions on data generating mechanisms. These likelihoods may be appropriate for data sets where it is difficult to propose physically motivated models. We give some properties of these likelihoods, showing how they can be computed numerically b ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Here we use a class of likelihoods which makes weak assumptions on data generating mechanisms. These likelihoods may be appropriate for data sets where it is difficult to propose physically motivated models. We give some properties of these likelihoods, showing how they can be computed numerically by use of the BlahutArimoto algorithm. Then, in the context of a data set for which no plausible physical model is apparent, we show how these likelihoods give useful inferences for the location of a distribution. The plausibility of the inferences is enhanced by the extensive robustness analysis these likelihoods permit.
On the Continuity of Achievable Rate Regions for Source Coding over Networks
"... Abstract — The continuity property of achievable rate regions for source coding over networks is considered. We show ratedistortion regions are continuous with respect to distortion vectors. Then we focus on the continuity of lossless rate regions with respect to source distribution: First, the proo ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract — The continuity property of achievable rate regions for source coding over networks is considered. We show ratedistortion regions are continuous with respect to distortion vectors. Then we focus on the continuity of lossless rate regions with respect to source distribution: First, the proof of continuity for general networks with independent sources is given; then, for the case of dependent sources, continuity is proven both in examples where oneletter characterizations are known and in examples where oneletter characterizations are not known; the proofs in the latter case rely on the concavity of the rate regions for those networks. I.
Iterative computation of ratedistortion bounds for scalable source coding
 In IEEE Int. Symposium on Information Theory
, 2000
"... We consider Nlayer scalable source coding of a finite memoryless ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
We consider Nlayer scalable source coding of a finite memoryless