MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation (2004) [29 citations — 8 self]

by Arindam Banerjee ,  Inderjit Dhillon ,  Joydeep Ghosh ,  Srujana Merugu ,  Dharmendra S. Modha
In KDD
Add To MetaCart

Abstract:

Co-clustering is a powerful data mining technique with varied applications such as text clustering, microarray analysis and recommender systems. Recently, an informationtheoretic co-clustering approach applicable to empirical joint probability distributions was proposed. In many situations, co-clustering of more general matrices is desired. In this paper, we present a substantially generalized co-clustering framework wherein any Bregman divergence can be used in the objective function, and various conditional expectation based constraints can be considered based on the statistics that need to be preserved. Analysis of the coclustering problem leads to the minimum Bregman information principle, which generalizes the maximum entropy principle, and yields an elegant meta algorithm that is guaranteed to achieve local optimality. Our methodology yields new algorithms and also encompasses several previously known clustering and co-clustering algorithms based on alternate minimization.

Citations

1543 Convex Analysis – ROCKAFELLAR - 1970
1478 Algorithms for Clustering Data – Jain, Dubes - 1988
663 GroupLens: An open architecture for collaborative filtering of netnews – Resnick, Iacovou, et al. - 1994
311 Information theory and statistical mechanics – Jaynes - 1957
242 Algorithms for non-negative matrix factorization – Lee, Seung - 2001
198 Biclustering of expression data – Cheng, Church
158 Latent semantic indexing: A probabilistic analysis – Papadimitriou, Tamaki, et al. - 1998
155 Co-clustering documents and words using bipartite spectral graph partitioning – Dhillon - 2001
134 Axiomatic derivation of the principle of maximum entropy and the principle of minimimum cross-entropy – Shore, Johnson - 1980
126 Application of dimensionality reduction in recommender system–a case study – Sarwar, Karypis, et al. - 2000
122 Parallel optimization: Theory, algorithms, and applications – Censor, Zenios - 1997
122 Logistic regression, AdaBoost and Bregman distances – COLLINS, SCHAPIRE, et al. - 2000
100 The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming – Bregman - 1967
98 Information-theoretic co-clustering – Dhillon, Mallela, et al.
88 Why least squares and maximum entropy? an axiomatic approach to inference for linear inverse problems – Csiszár
87 Clustering with Bregman divergences – Banerjee, Merugu, et al. - 2004
78 Learning from dyadic data – Hofmann, Puzicha, et al. - 1999
78 Latent semantic models for collaborative filtering – Hofmann
59 Direct clustering of a data matrix – Hartigan - 1972
50 A divisive information-theoretic feature clustering algorithm for text classification – Dhillon, Mallela, et al. - 2003
35 Spectral biclustering of microarray data: coclustering genes and conditions – Kluger, Basri, et al.
31 Additive models, boosting, and inference for generalized divergences. COLT’99 – Lafferty - 1999
29 Fully automatic crossassociations – Chakrabarti, Papadimitriou, et al. - 2004
28 Borwein: Legendre functions and the method of random Bregman projections – Bauschke, M - 1997
27 Minimum sum squared residue co-clustering of gene expression data – Cho, Dhillon, et al. - 2004
25 Game theory, maximum entropy, minimum discrepancy, and robust Bayesian decision theory. The Annals of Statistics 32(4 – Grünwald, Dawid - 2004
19 A general model for clustering binary data – Li - 2005
13 A Scalable Collaborative Filtering Framework based on CoClustering – George, Merugu - 2005
12 Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering – Gao, Liu, et al. - 2005
9 Clustering of bipartite advertiser-keyword graph – Carrasco, Fain, et al. - 2003
7 On the optimality of conditional expectation as a Bregman predictor – Banerjee, Guo, et al. - 2005
7 Trained named entity recognition using distributional clusters – Freitag - 2004
7 Image and feature co-clustering – Qiu - 2004
4 Towards full automation of lexicon construction – Rohwer, Freitag - 2004
3 Unsupervised auditory scene categorization via key audio effects and information-theoretic co-clustering – Cai, Lu, et al. - 2005
2 Spectral images and features co-clustering with application to content-based image retrieval – Guan, Qiu, et al. - 2005
2 Co-clustering for text categorization – Takamura, Matsumoto - 2003
1 Table 17: Notation used in the paper – Azoury, Warmuth
1 Multi-way distributional clustering via pairwise interactions – BANERJEE, GHOSH, et al. - 2005
1 Erlbaum Assoc., 2003. GroupLens. Movielens data set. http://www.cs.umn.edu/Research/GroupLens/data/ml-data.tar.gz – Lawrence
1 Word clustering and disambiguation based on co-occurence data – BANERJEE, GHOSH, et al. - 1998
1 The American Journal of Human Genetics, 75:850–861 – Madeira, Oliveira - 2004
1 Subspace clustering for high dimensinal data: A review – Parsons, Haque, et al.