Results 1 
7 of
7
Clustering with Bregman Divergences
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergence ..."
Abstract

Cited by 310 (52 self)
 Add to MetaCart
A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergences. The proposed algorithms unify centroidbased parametric clustering approaches, such as classical kmeans and informationtheoretic clustering, which arise by special choices of the Bregman divergence. The algorithms maintain the simplicity and scalability of the classical kmeans algorithm, while generalizing the basic idea to a very large class of clustering loss functions. There are two main contributions in this paper. First, we pose the hard clustering problem in terms of minimizing the loss in Bregman information, a quantity motivated by ratedistortion theory, and present an algorithm to minimize this loss. Secondly, we show an explicit bijection between Bregman divergences and exponential families. The bijection enables the development of an alternative interpretation of an ecient EM scheme for learning models involving mixtures of exponential distributions. This leads to a simple soft clustering algorithm for all Bregman divergences.
Natural Type Selection in Adaptive Lossy Compression
, 2000
"... Consider approximate (lossy) matching of a source string P , with a random codebook generated from reproduction distribution Q, at a specified distortion d. Recent work determined the minimum coding rate, R 1 = R(P; Q; d), for this setting. We observe that for large word length and with high pro ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
Consider approximate (lossy) matching of a source string P , with a random codebook generated from reproduction distribution Q, at a specified distortion d. Recent work determined the minimum coding rate, R 1 = R(P; Q; d), for this setting. We observe that for large word length and with high probability, the matching codeword is typical with a distribution Q 1 which is different from Q. If a new random codebook is generated Q 1 , then the source string will favor code words which are typical with a new distribution Q 2 , resulting in minimum coding rate R 2 = R(P; Q 1 ; d), and so on. We show that the sequences of distributions Q 1 ; Q 2 : : : and rates R 1 ; R 2 ; : : :, generated by this procedure, converge to an optimum reproduction distribution Q , and the ratedistortion function R(P; d), respectively. We also derive a fixed ratedistortion slope version of this natural type selection process. In the latter case, an iteration of the process stochastically simulates a...
A Minimally Informative Likelihood for Decision Analysis: Robustness and Illustration
 Canadian Journal Statistics
, 1999
"... Here we use a class of likelihoods which makes weak assumptions on data generating mechanisms. These likelihoods may be appropriate for data sets where it is difficult to propose physically motivated models. We give some properties of these likelihoods, showing how they can be computed numerically b ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Here we use a class of likelihoods which makes weak assumptions on data generating mechanisms. These likelihoods may be appropriate for data sets where it is difficult to propose physically motivated models. We give some properties of these likelihoods, showing how they can be computed numerically by use of the BlahutArimoto algorithm. Then, in the context of a data set for which no plausible physical model is apparent, we show how these likelihoods give useful inferences for the location of a distribution. The plausibility of the inferences is enhanced by the extensive robustness analysis these likelihoods permit.
Scalable Clustering Algorithms
, 2005
"... I would like to thank a number of people who contributed in their own unique ways in making my experience as a graduate student very rewarding. First, I would like to thank my advisor Prof. Joydeep Ghosh for his support and guidance over the years. Prof. Ghosh always showed great faith in my capabil ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
I would like to thank a number of people who contributed in their own unique ways in making my experience as a graduate student very rewarding. First, I would like to thank my advisor Prof. Joydeep Ghosh for his support and guidance over the years. Prof. Ghosh always showed great faith in my capabilities and have allowed me to work quite independently at times, while providing invaluable guidance when necessary. His friendly accessibility and constant encouragement has been one of the key factors that has helped me mature as a researcher. I am grateful to Prof. Inderjit Dhillon for his enthusiastic encouragement and often pushing me to think more deeply about certain research issues. I have learnt various other aspects of being a good researcher from him, including technical writing as well as presentation skills. I would also like to thank Prof. Ray Mooney for introducing me to Machine Learning, and giving me the opportunity to collaborate with him on several occasions. I want to thank my committee members for inputs at various levels of my thesis work. I would specially like to thank Srujana Merugu for being my most active
An Information Theoretic Approach To The Study Of Auditory Coding
, 2003
"... This dissertation develops information theoretic tools to study properties of the neural code used by the auditory system, and applies them to electrophysiological recordings in three auditory processing stations: auditory cortex (AI), thalamus (MGB) and inferior colliculus (IC). It focuses on seve ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
This dissertation develops information theoretic tools to study properties of the neural code used by the auditory system, and applies them to electrophysiological recordings in three auditory processing stations: auditory cortex (AI), thalamus (MGB) and inferior colliculus (IC). It focuses on several aspects of the neural code: First, robust estimation of the information carried by spike trains is developed, using a variety of dimensionality reduction techniques Secondly, measures of informational redundancy in small groups of neurons are developed. These are applied to neural activity in a series of brain regions, demonstrating a process of redundancy reduction in the ascending processing pathway. Finally, a method to identify relevant features, by filtering out the effects of lower processing stations is developed. This approach is shown to have numerous applications in domains extending far beyond neural coding. These three components are summarized below. The problem of the
BY
"... Since the publication of Shannon’s theory of one terminal source coding, a number of interesting extensions have been derived by researchers such as SlepianWolf, Wyner, AhlswedeKörner, WynerZiv, Berger et al. and BergerYeung. Specifically, the achievable rate or ratedistortion region has been d ..."
Abstract
 Add to MetaCart
Since the publication of Shannon’s theory of one terminal source coding, a number of interesting extensions have been derived by researchers such as SlepianWolf, Wyner, AhlswedeKörner, WynerZiv, Berger et al. and BergerYeung. Specifically, the achievable rate or ratedistortion region has been described by a first order informationtheoretic functional of the source statistics in each of the above cases. At the same time various source coding problems have also remained unsolved: Notable two terminal examples include the joint distortion problem, where both sources are reconstructed under a combined distortion criterion, as well as the partial side information problem, where one source is reconstructed under a distortion criterion using information about the other (side information) available at a certain rate (partially). In this thesis, we describe the ratedistortion region for each of these open problems by an infinite order informationtheoretic functional of source distribution. However, our description does not immediately give a plot of or even an algorithm to plot the corresponding achievable region. In fact, we set distributed source coding problems in a general framework and take a unified structural view of not only the above open problems but any twoterminal
An Algorithm for Maximizing Expected Log Investment Return
, 1983
"... AbstractLet the random (stock market) vector X 2 0 be drawn according to a known distribution function F(x), x E R”. A logoptimal portfolio b * is any portfolio b achieving maximal expected log return W * = sup,, E In b’X, where the supremum is over the simplex b 2 0, Cr, b, = 1. An algorithm is ..."
Abstract
 Add to MetaCart
AbstractLet the random (stock market) vector X 2 0 be drawn according to a known distribution function F(x), x E R”. A logoptimal portfolio b * is any portfolio b achieving maximal expected log return W * = sup,, E In b’X, where the supremum is over the simplex b 2 0, Cr, b, = 1. An algorithm is presented for finding b*. The algorithm consists of replacing the portfolio b by the expected portfolio b’, b; = E ( b, X,/b’X), corresponding to the expected proportion of holdings in each stock after one market period. The improvement in W(b) after each iteration is lowerbounded by the KullbackLeibler information number D ( b’ll b) between the current and updated portfolios. Thus the algorithm monotonically improves the return W. An upper bound on W * is given in terms of the current portfolio and the gradient, and the convergence of the algorithm is established. L I. INTK~DUCTI~N ET X, denote the random capital return from the investment of one unit in the i th stock, i = 1,2,..., m. For example, if stock i is bought for 20 and sold for 30, then Xi = 1.5. The stock vector X is a nonnegative vectorvalued random variable drawn according to a known distribution function F(x), x E R”‘. A portfolio b = (61, b,,..., b,)‘, b,kO, xbi=l, is an allocation of investment capital over the stocks X = (Xl, x2, * *., X,)‘. The expected log return W(b) and the maximal expected log return W * are given by W(b) = Eln~rX=Jlnb’xdF(x), W * = rnbax W(b). (1.1) We wish to determine the portfolio 6 * (unique if the support set of X is of full dimension) that maximizes the expected log return W(b). A discussion of the naturalness of this objective can be found in the series of papers by