Results 1  10
of
42
Clustering with Bregman Divergences
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergence ..."
Abstract

Cited by 377 (55 self)
 Add to MetaCart
(Show Context)
A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergences. The proposed algorithms unify centroidbased parametric clustering approaches, such as classical kmeans and informationtheoretic clustering, which arise by special choices of the Bregman divergence. The algorithms maintain the simplicity and scalability of the classical kmeans algorithm, while generalizing the basic idea to a very large class of clustering loss functions. There are two main contributions in this paper. First, we pose the hard clustering problem in terms of minimizing the loss in Bregman information, a quantity motivated by ratedistortion theory, and present an algorithm to minimize this loss. Secondly, we show an explicit bijection between Bregman divergences and exponential families. The bijection enables the development of an alternative interpretation of an ecient EM scheme for learning models involving mixtures of exponential distributions. This leads to a simple soft clustering algorithm for all Bregman divergences.
Survey of clustering data mining techniques
, 2002
"... Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in math ..."
Abstract

Cited by 351 (0 self)
 Add to MetaCart
(Show Context)
Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in mathematics, statistics, and numerical analysis. From a machine learning perspective clusters correspond to hidden patterns, the search for clusters is unsupervised learning, and the resulting system represents a data concept. From a practical perspective clustering plays an outstanding role in data mining applications such as scientific data exploration, information retrieval and text mining, spatial database applications, Web analysis, CRM, marketing, medical diagnostics, computational biology, and many others. Clustering is the subject of active research in several fields such as statistics, pattern recognition, and machine learning. This survey focuses on clustering in data mining. Data mining adds to clustering the complications of very large datasets with very many attributes of different types. This imposes unique
A divisive informationtheoretic feature clustering algorithm for text classification
 Journal of Machine Learning Research
, 2003
"... High dimensionality of text can be a deterrent in applying complex learners such as Support Vector Machines to the task of text classification. Feature clustering is a powerful alternative to feature selection for reducing the dimensionality of text data. In this paper we propose a new informationth ..."
Abstract

Cited by 129 (14 self)
 Add to MetaCart
High dimensionality of text can be a deterrent in applying complex learners such as Support Vector Machines to the task of text classification. Feature clustering is a powerful alternative to feature selection for reducing the dimensionality of text data. In this paper we propose a new informationtheoretic divisive algorithm for feature/word clustering and apply it to text classification. Existing techniques for such “distributional clustering ” of words are agglomerative in nature and result in (i) suboptimal word clusters and (ii) high computational cost. In order to explicitly capture the optimality of word clusters in an information theoretic framework, we first derive a global criterion for feature clustering. We then present a fast, divisive algorithm that monotonically decreases this objective function value. We show that our algorithm minimizes the “withincluster JensenShannon divergence ” while simultaneously maximizing the “betweencluster JensenShannon divergence”. In comparison to the previously proposed agglomerative strategies our divisive algorithm is much faster and achieves comparable or higher classification accuracies. We further show that feature clustering is an effective technique for building smaller class models in hierarchical classification. We present detailed experimental results using Naive Bayes and Support Vector Machines on the 20Newsgroups data set and a 3level hierarchy of HTML documents collected from the Open Directory project (www.dmoz.org).
Simultaneous feature selection and clustering using mixture models
 IEEE TRANS. PATTERN ANAL. MACH. INTELL
, 2004
"... Clustering is a common unsupervised learning technique used to discover group structure in a set of data. While there exist many algorithms for clustering, the important issue of feature selection, that is, what attributes of the data should be used by the clustering algorithms, is rarely touched u ..."
Abstract

Cited by 101 (1 self)
 Add to MetaCart
Clustering is a common unsupervised learning technique used to discover group structure in a set of data. While there exist many algorithms for clustering, the important issue of feature selection, that is, what attributes of the data should be used by the clustering algorithms, is rarely touched upon. Feature selection for clustering is difficult because, unlike in supervised learning, there are no class labels for the data and, thus, no obvious criteria to guide the search. Another important problem in clustering is the determination of the number of clusters, which clearly impacts and is influenced by the feature selection issue. In this paper, we propose the concept of feature saliency and introduce an expectationmaximization (EM) algorithm to estimate it, in the context of mixturebased clustering. Due to the introduction of a minimum message length model selection criterion, the saliency of irrelevant features is driven toward zero, which corresponds to performing feature selection. The criterion and algorithm are then extended to simultaneously estimate the feature saliencies and the number of clusters.
Consistent bipartite graph copartitioning for starstructured highorder heterogeneous data coclustering
 KDD
, 2005
"... Heterogeneous data coclustering has attracted more and more attention in recent years due to its high impact on various applications. While the coclustering algorithms for two types of heterogeneous data (denoted by pairwise coclustering), such as documents and terms, have been well studied in t ..."
Abstract

Cited by 47 (2 self)
 Add to MetaCart
(Show Context)
Heterogeneous data coclustering has attracted more and more attention in recent years due to its high impact on various applications. While the coclustering algorithms for two types of heterogeneous data (denoted by pairwise coclustering), such as documents and terms, have been well studied in the literature, the work on more types of heterogeneous data (denoted by highorder coclustering) is still very limited. As an attempt in this direction, in this paper, we worked on a specific case of highorder coclustering in which there is a central type of objects that connects the other types so as to form a star structure of the interrelationships. Actually, this case could be a very good abstract for many realworld applications, such as the coclustering of categories, documents and terms in text mining. In our philosophy, we treated such kind of problems as the fusion of multiple pairwise coclustering subproblems with the constraint of the star structure. Accordingly, we proposed the concept of consistent bipartite graph copartitioning, and developed an algorithm based on semidefinite programming (SDP) for efficient computation of the clustering results. Experiments on toy problems and real data both verified the effectiveness of our proposed method.
LEARNER: A System for Acquiring Commonsense Knowledge by Analogy
 in Proceedings of Second International Conference on Knowledge Capture (KCAP
, 2003
"... One of the longterm goals of Artificial Intelligence is construction of a machine that is capable of reasoning about the everyday world the way humans are. In this paper, I first argue that construction of a large collection of statements about everyday world (a repository of commonsense knowledge) ..."
Abstract

Cited by 34 (4 self)
 Add to MetaCart
(Show Context)
One of the longterm goals of Artificial Intelligence is construction of a machine that is capable of reasoning about the everyday world the way humans are. In this paper, I first argue that construction of a large collection of statements about everyday world (a repository of commonsense knowledge) is a valuable step towards this longterm goal. Then, I point out that volunteer contributors over the Internet — a frequently overlooked source of knowledge — can be tapped to construct such a knowledge repository. To operationalize construction of a large commonsense knowledge repository by volunteer contributors, I then introduce cumulative analogy, a class of analogybased reasoning algorithms that leverage existing knowledge to pose knowledge acquisition questions to the volunteer contributors. The algorithms have been implemented and deployed as the Learner system. To date, about 3,400 volunteer contributors have interacted with the system over the course of 11 months, increasing a starting collection of 47,147 statements by 362 % to a total of 217,971. The deployed system and the growing collection of knowledge it acquired are publicly available from
Deterministic pivoting algorithms for constrained ranking and Clustering Problems
, 2007
"... We consider ranking and clustering problems related to the aggregation of inconsistent information, in particular, rank aggregation, (weighted) feedback arc set in tournaments, consensus and correlation clustering, and hierarchical clustering. Ailon, Charikar, and Newman [4], Ailon and Charikar [3], ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
We consider ranking and clustering problems related to the aggregation of inconsistent information, in particular, rank aggregation, (weighted) feedback arc set in tournaments, consensus and correlation clustering, and hierarchical clustering. Ailon, Charikar, and Newman [4], Ailon and Charikar [3], and Ailon [2] proposed randomized constant factor approximation algorithms for these problems, which recursively generate a solution by choosing a random vertex as “pivot ” and dividing the remaining vertices into two groups based on the pivot vertex. In this paper, we answer an open question in these works by giving deterministic approximation algorithms for these problems. The analysis of our algorithms is simpler than the analysis of the randomized algorithms in [4], [3] and [2]. In addition, we consider the problem of finding minimumcost rankings and clusterings which must obey certain constraints (e.g. an input partial order in the case of ranking problems), which were introduced by Hegde and Jain [25] (see also [34]). We show that the first type of algorithms we propose can also handle these constrained problems. In addition, we show that in the case of a rank aggregation or consensus clustering problem, if the input rankings or clusterings obey the constraints, then we can always ensure that the output of
Seeding nonnegative matrix factorization with the spherical kmeans clustering
, 2003
"... The final copy of this thesis has been examined by the signatories, and we find that both the content and the form meet acceptable presentation standards of scholarly work in the above mentioned discipline. ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
(Show Context)
The final copy of this thesis has been examined by the signatories, and we find that both the content and the form meet acceptable presentation standards of scholarly work in the above mentioned discipline.
Locally adaptive metrics for clustering high dimensional data
, 2006
"... Abstract. Clustering suffers from the curse of dimensionality, and similarity functions that use all input features with equal relevance may not be effective. We introduce an algorithm that discovers clusters in subspaces spanned by different combinations of dimensions via local weightings of featur ..."
Abstract

Cited by 21 (7 self)
 Add to MetaCart
Abstract. Clustering suffers from the curse of dimensionality, and similarity functions that use all input features with equal relevance may not be effective. We introduce an algorithm that discovers clusters in subspaces spanned by different combinations of dimensions via local weightings of features. This approach avoids the risk of loss of information encountered in global dimensionality reduction techniques, and does not assume any data distribution model. Our method associates to each cluster a weight vector, whose values capture the relevance of features within the corresponding cluster. We experimentally demonstrate the gain in perfomance our method achieves with respect to competitive methods, using both synthetic and real datasets. In particular, our results show the feasibility of the proposed technique to perform simultaneous clustering of genes and conditions in gene expression data, and clustering of very high dimensional data such as text data.
Feature Selection in MixtureBased Clustering
, 2002
"... While there exist many approaches to clustering, the important issue of feature selection, that is, what attributes of the data are relevant, is rarely addressed. Feature selection for clustering is made difficult by the absence of class labels to guide the search. In this paper, we propose two appr ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
While there exist many approaches to clustering, the important issue of feature selection, that is, what attributes of the data are relevant, is rarely addressed. Feature selection for clustering is made difficult by the absence of class labels to guide the search. In this paper, we propose two approaches to deal with this problem. In the first one, instead of making hard selections, we estimate how salient each features is. An expectationmaximization (EM) algorithm is derived for this task. The second approach extends Koller and Sahami's mutualinformationbased feature relevance criterion to the unsupervised case. Implementation is carried out by a backward search scheme. The resulting algorithm can be classified as a "wrapper", since it wraps mixture estimation in an outer layer that performs feature selection. Experimental results on synthetic and real data show that both methods have promising performance. 1