Results 1  10
of
29
Clustering with Bregman Divergences
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergence ..."
Abstract

Cited by 310 (52 self)
 Add to MetaCart
A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergences. The proposed algorithms unify centroidbased parametric clustering approaches, such as classical kmeans and informationtheoretic clustering, which arise by special choices of the Bregman divergence. The algorithms maintain the simplicity and scalability of the classical kmeans algorithm, while generalizing the basic idea to a very large class of clustering loss functions. There are two main contributions in this paper. First, we pose the hard clustering problem in terms of minimizing the loss in Bregman information, a quantity motivated by ratedistortion theory, and present an algorithm to minimize this loss. Secondly, we show an explicit bijection between Bregman divergences and exponential families. The bijection enables the development of an alternative interpretation of an ecient EM scheme for learning models involving mixtures of exponential distributions. This leads to a simple soft clustering algorithm for all Bregman divergences.
Game Theory, Maximum Entropy, Minimum Discrepancy And Robust Bayesian Decision Theory
 ANNALS OF STATISTICS
, 2004
"... ..."
Informationtheoretic measures for knowledge discovery and data mining, in: Entropy Measures, Maximum Entropy and Emerging Applications, Karmeshu (Ed
 in Entropy Measures, Maximum Entropy and Emerging Applications
, 2003
"... Abstract. A database may be considered as a statistical population, and an attribute as a statistical variable taking values from its domain. One can carry out statistical and informationtheoretic analysis on a database. Based on the attribute values, a database can be partitioned into smaller popu ..."
Abstract

Cited by 20 (6 self)
 Add to MetaCart
Abstract. A database may be considered as a statistical population, and an attribute as a statistical variable taking values from its domain. One can carry out statistical and informationtheoretic analysis on a database. Based on the attribute values, a database can be partitioned into smaller populations. An attribute is deemed important if it partitions the database such that previously unknown regularities and patterns are observable. Many informationtheoretic measures have been proposed and applied to quantify the importance of attributes and relationships between attributes in various fields. In the context of knowledge discovery and data mining (KDD), we present a critical review and analysis of informationtheoretic measures of attribute importance and attribute association, with emphasis on their interpretations and connections. 1
A Continuous Metric Scaling Solution for a Random Variable
 Journal of Multivariate Analysis
, 1994
"... As a generalization of the classical Metric Scaling solution for a finite set of points, a countable set of uncorrelated random variables is obtained from an arbitrary continuous random variable X. The properties of these variables allow us to regard them as Principal Axes for X with respect to the ..."
Abstract

Cited by 19 (15 self)
 Add to MetaCart
As a generalization of the classical Metric Scaling solution for a finite set of points, a countable set of uncorrelated random variables is obtained from an arbitrary continuous random variable X. The properties of these variables allow us to regard them as Principal Axes for X with respect to the distance function d(u; v) = p ju \Gamma vj. Explicit results are obtained for uniform and negative exponential random variables. Keywords and Phrases Principal components of a stochastic process, Principal Coordinate Analysis. AMS Subject classification: 62H25 1 Introduction Metric Scaling or Principal Coordinate Analysis, introduced by Torgerson [14] and especially Gower [9], is a method of ordination aiming to provide a graphical representation of a finite set of n elements. The method obtains a n \Theta m matrix X from an n \Theta n Euclidean distance matrix \Delta = (ffi ij ) . The set of n rows of X, considered as points in R m , has interdistances which reproduce those in \Delta ...
The Proximity of an Individual to a Population With Applications in Discriminant Analysis
, 1995
"... : We develop a proximity function between an individual and a population from a distance between multivariate observations. We study some properties of this construction and apply it to a distancebased discrimination rule, which contains the classic linear discriminant function as a particular ..."
Abstract

Cited by 18 (10 self)
 Add to MetaCart
: We develop a proximity function between an individual and a population from a distance between multivariate observations. We study some properties of this construction and apply it to a distancebased discrimination rule, which contains the classic linear discriminant function as a particular case. Additionally, this rule can be used advantageously for categorical or mixed variables, or in problems where a probabilistic model is not well determined. This approach is illustrated and compared with other classic procedures using four real data sets. Keywords: Categorical and mixed data; Distances between observations; Multidimensional scaling; Discrimination; Classification rules. AMS Subject Classification: 62H30 The authors thank M.Abrahamowicz, J. C. Gower and M. Greenacre for their helpful comments, and W. J. Krzanowski for providing us with a data set and his quadratic location model program. Work supported in part by CGYCIT grant PB930784. Authors' address: Departam...
Maxmargin minentropy models
 In AISTATS
, 2012
"... We propose a new family of latent variable models called maxmargin minentropy (m3e) models, which define a distribution over the output and the hidden variables conditioned on the input. Given an input, an m3e model predicts the output with the smallest corresponding Rényi entropy of generalized d ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
We propose a new family of latent variable models called maxmargin minentropy (m3e) models, which define a distribution over the output and the hidden variables conditioned on the input. Given an input, an m3e model predicts the output with the smallest corresponding Rényi entropy of generalized distribution. This is equivalent to minimizing a score that consists of two terms: (i) the negative loglikelihood of the output, ensuring that the output has a high probability; and (ii) a measure of uncertainty over the distribution of the hidden variables conditioned on the input and the output, ensuring that there is little confusion in the values of the hidden variables. Given a training dataset, the parameters of an m3e model are learned by maximizing the margin between the Rényi entropies of the groundtruth output and all other incorrect outputs. Training an m3e can be viewed as minimizing an upper bound on a userdefined loss, and includes, as a special case, the latent support vector machine framework. We demonstrate the efficacy of m3e models on two standard machine learning applications, discriminative motif finding and image classification, using publicly available datasets. 1
Indicators of the Interdisciplinarity of Journals: Diversity, Centrality, and Citations
"... A citationbased indicator for interdisciplinarity has been missing hitherto among the set of available journal indicators. In this study, we investigate betweenness centrality, entropy, the Gini coefficient, and more recently proposed measures for diversity that combine the statistics of vectors an ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
A citationbased indicator for interdisciplinarity has been missing hitherto among the set of available journal indicators. In this study, we investigate betweenness centrality, entropy, the Gini coefficient, and more recently proposed measures for diversity that combine the statistics of vectors and distances in networks, in terms of their potential to fill this gap. The effects of various normalizations are specified and measured using the matrix of 8,207 journals contained in the Journal Citation Reports of the (Social) Science Citation Index. Betweenness centrality in (1mode) affiliations networks provides an indicator outperforming betweenness in the (2mode) citation network. Entropy as a vectorbased indicator performs better than the Gini coefficient, but is sensitive to size. Science and Nature, for example, are indicated at the top of the list. The new diversity measure provides reasonable results when (1 – cosine) is assumed as a measure for the distance, but results using Euclidean distances are difficult to interpret.
C.: Measuring diversity: the importance of species similarity
"... Realistic measures of biodiversity should reflect not only the relative abundances of species, but also the differences between them. We present a natural family of diversity measures taking both factors into account. This is not just another addition to the already long list of diversity indices: i ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Realistic measures of biodiversity should reflect not only the relative abundances of species, but also the differences between them. We present a natural family of diversity measures taking both factors into account. This is not just another addition to the already long list of diversity indices: instead, a single formula subsumes many of the most popular indices, including Shannon’s, Simpson’s, species richness, and Rao’s quadratic entropy. These popular indices can then be used and understood in a unified way, and the relationships between them are made plain. The new measures are, moreover, effective numbers, so that percentage changes and ratio comparisons of diversity value are meaningful. We advocate the use of diversity profiles, which provide a faithful graphical representation of the shape of a community; they show how the perceived diversity changes as the emphasis shifts from rare to common species. Communities can usefully be compared by comparing their diversity profiles. We show by example that this is a far more subtle method than any relying on a single statistic. Some ecologists view diversity indices with suspicion, questioning whether they are biologically meaningful. By dropping the naive assumption that distinct species have nothing in common, working with effective numbers, and using diversity profiles, we arrive at a system of diversity measurement that should lay much of this suspicion to rest. Key words: diversity, biodiversity, entropy, quadratic entropy, species similarity, model, effective number, diversity profile, microbial diversity. 1
Chessel D: From dissimilarities among species to dissimilarities among communities: a double principal coordinate analysis
 Journal of Theoretical Biology
"... This paper presents a new ordination method to compare several communities containing species that differ according to their taxonomic, morphological or biological features. The objective is first to find dissimilarities among communities from the knowledge about differences among their species, and ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
This paper presents a new ordination method to compare several communities containing species that differ according to their taxonomic, morphological or biological features. The objective is first to find dissimilarities among communities from the knowledge about differences among their species, and second to describe these dissimilarities with regard to the feature diversity within communities. In 1986, Rao initiated a general framework for analysing the extent of the diversity. He defined a diversity coefficient called quadratic entropy and a dissimilarity coefficient and proposed a decomposition of this diversity coefficient in a way similar to ANOVA. Furthermore, Gower and Legendre (1986) built a weighted principal coordinate analysis. Using the previous context, we propose a new method called the double principal coordinate analysis (DPCoA) to analyse the relation between two kinds of data. The first contains differences among species (dissimilarity matrix); the second the species distribution among communities (abundance or presence/absence matrix). A multidimensional space assembling the species points and the community points is built. The species points define the original differences between species and the community points define the deduced differences between communities. Furthermore, this multidimensional space is linked with the diversity decomposition into betweencommunity and withincommunity diversities. One looks for axes that provide a graphical ordination of the communities and project the species onto them. An illustration is proposed comparing bird communities which live in different areas under mediterranean bioclimates. Compared to some existing methods, the double principal coordinate analysis can provide a typology of communities taking