Co-clustering is a powerful data mining technique with varied applications such as text clustering, microarray analysis and recommender systems. Recently, an informationtheoretic co-clustering approach applicable to empirical joint probability distributions was proposed. In many situations, co-clustering of more general matrices is desired. In this paper, we present a substantially generalized co-clustering framework wherein any Bregman divergence can be used in the objective function, and various conditional expectation based constraints can be considered based on the statistics that need to be preserved. Analysis of the coclustering problem leads to the minimum Bregman information principle, which generalizes the maximum entropy principle, and yields an elegant meta algorithm that is guaranteed to achieve local optimality. Our methodology yields new algorithms and also encompasses several previously known clustering and co-clustering algorithms based on alternate minimization.
|
1543
|
Convex Analysis
– ROCKAFELLAR
- 1970
|
|
1478
|
Algorithms for Clustering Data
– Jain, Dubes
- 1988
|
|
663
|
GroupLens: An open architecture for collaborative filtering of netnews
– Resnick, Iacovou, et al.
- 1994
|
|
311
|
Information theory and statistical mechanics
– Jaynes
- 1957
|
|
242
|
Algorithms for non-negative matrix factorization
– Lee, Seung
- 2001
|
|
198
|
Biclustering of expression data
– Cheng, Church
|
|
158
|
Latent semantic indexing: A probabilistic analysis
– Papadimitriou, Tamaki, et al.
- 1998
|
|
155
|
Co-clustering documents and words using bipartite spectral graph partitioning
– Dhillon
- 2001
|
|
134
|
Axiomatic derivation of the principle of maximum entropy and the principle of minimimum cross-entropy
– Shore, Johnson
- 1980
|
|
126
|
Application of dimensionality reduction in recommender system–a case study
– Sarwar, Karypis, et al.
- 2000
|
|
122
|
Parallel optimization: Theory, algorithms, and applications
– Censor, Zenios
- 1997
|
|
122
|
Logistic regression, AdaBoost and Bregman distances
– COLLINS, SCHAPIRE, et al.
- 2000
|
|
100
|
The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming
– Bregman
- 1967
|
|
98
|
Information-theoretic co-clustering
– Dhillon, Mallela, et al.
|
|
88
|
Why least squares and maximum entropy? an axiomatic approach to inference for linear inverse problems
– Csiszár
|
|
87
|
Clustering with Bregman divergences
– Banerjee, Merugu, et al.
- 2004
|
|
78
|
Learning from dyadic data
– Hofmann, Puzicha, et al.
- 1999
|
|
78
|
Latent semantic models for collaborative filtering
– Hofmann
|
|
59
|
Direct clustering of a data matrix
– Hartigan
- 1972
|
|
50
|
A divisive information-theoretic feature clustering algorithm for text classification
– Dhillon, Mallela, et al.
- 2003
|
|
35
|
Spectral biclustering of microarray data: coclustering genes and conditions
– Kluger, Basri, et al.
|
|
31
|
Additive models, boosting, and inference for generalized divergences. COLT’99
– Lafferty
- 1999
|
|
29
|
Fully automatic crossassociations
– Chakrabarti, Papadimitriou, et al.
- 2004
|
|
28
|
Borwein: Legendre functions and the method of random Bregman projections
– Bauschke, M
- 1997
|
|
27
|
Minimum sum squared residue co-clustering of gene expression data
– Cho, Dhillon, et al.
- 2004
|
|
25
|
Game theory, maximum entropy, minimum discrepancy, and robust Bayesian decision theory. The Annals of Statistics 32(4
– Grünwald, Dawid
- 2004
|
|
19
|
A general model for clustering binary data
– Li
- 2005
|
|
13
|
A Scalable Collaborative Filtering Framework based on CoClustering
– George, Merugu
- 2005
|
|
12
|
Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering
– Gao, Liu, et al.
- 2005
|
|
9
|
Clustering of bipartite advertiser-keyword graph
– Carrasco, Fain, et al.
- 2003
|
|
7
|
On the optimality of conditional expectation as a Bregman predictor
– Banerjee, Guo, et al.
- 2005
|
|
7
|
Trained named entity recognition using distributional clusters
– Freitag
- 2004
|
|
7
|
Image and feature co-clustering
– Qiu
- 2004
|
|
4
|
Towards full automation of lexicon construction
– Rohwer, Freitag
- 2004
|
|
3
|
Unsupervised auditory scene categorization via key audio effects and information-theoretic co-clustering
– Cai, Lu, et al.
- 2005
|
|
2
|
Spectral images and features co-clustering with application to content-based image retrieval
– Guan, Qiu, et al.
- 2005
|
|
2
|
Co-clustering for text categorization
– Takamura, Matsumoto
- 2003
|
|
1
|
Table 17: Notation used in the paper
– Azoury, Warmuth
|
|
1
|
Multi-way distributional clustering via pairwise interactions
– BANERJEE, GHOSH, et al.
- 2005
|
|
1
|
Erlbaum Assoc., 2003. GroupLens. Movielens data set. http://www.cs.umn.edu/Research/GroupLens/data/ml-data.tar.gz
– Lawrence
|
|
1
|
Word clustering and disambiguation based on co-occurence data
– BANERJEE, GHOSH, et al.
- 1998
|
|
1
|
The American Journal of Human Genetics, 75:850–861
– Madeira, Oliveira
- 2004
|
|
1
|
Subspace clustering for high dimensinal data: A review
– Parsons, Haque, et al.
|