## Model-based overlapping clustering (2005)

### Cached

### Download Links

- [www.lans.ece.utexas.edu]
- [www.ideal.ece.utexas.edu]
- [www.lans.ece.utexas.edu]
- [www.cs.utexas.edu]
- [www.cs.utexas.edu]
- [www.cs.utexas.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In KDD |

Citations: | 29 - 6 self |

### BibTeX

@INPROCEEDINGS{Banerjee05model-basedoverlapping,

author = {Arindam Banerjee and Chase Krumpelman and Joydeep Ghosh},

title = {Model-based overlapping clustering},

booktitle = {In KDD},

year = {2005},

pages = {532--537},

publisher = {ACM Press}

}

### Years of Citing Articles

### OpenURL

### Abstract

While the vast majority of clustering algorithms are partitional, many real world datasets have inherently overlapping clusters. Several approaches to finding overlapping clusters have come from work on analysis of biological datasets. In this paper, we interpret an overlapping clustering model proposed by Segal et al. [23] as a generalization of Gaussian mixture models, and we extend it to an overlapping clustering model based on mixtures of any regular exponential family distribution and the corresponding Bregman divergence. We provide the necessary algorithm modifications for this extension, and present results on synthetic data as well as subsets of 20-Newsgroups and EachMovie datasets.

### Citations

8089 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ... log-likelihood of the joint distribution, � we have 1 max log p(X,M,A) ≡ min M,A M,A 2σ2 �X − MA�2 � − log p(M) . To find the value of the hidden variables M and A, the SBK model uses an EM approach =-=[12]-=-. The E step involves finding the best estimates of the binary genes-process memberships M. The M step involves computing the prior probability of gene membership in each process p(M) and the process-... |

1363 |
Generalized linear models
- McCullagh, Nelder
- 1990
(Show Context)
Citation Context ...te concepts of boosting and logistic regression [11]. More recently, they have been studied in the context of clustering [2]. Our formulation has some similarities to generalized linear models (GLMs) =-=[21, 10]-=-. However, there are a few very important differences. In GLMs [21], a multidimensional regression problem of the form dφ(Y, f (BZ)) is solved where Z is the (known) input variable, Y is the (known) r... |

734 | H.S.: Algorithms for non-negative matrix factorization
- Lee, Seung
- 2001
(Show Context)
Citation Context ...ergence or un-normalized relative entropy, the problem min A dI(X,MA) = min A ∑ i, j � Xi j log Xi j − Xi j + (MA)i j (MA)i j � , (3) has been studied as a non-negative matrix factorization technique =-=[19]-=-. The optimal update for A for given X,M multiplicative and is given by A j h j ∑i M = Ah h i j j Xi /(MA) i ∑i M h i In order to prevent a division by 0, it makes sense to use max((MA) j i ,ε) and ma... |

601 |
Numerical methods for least squares problems
- Bjorck
- 1996
(Show Context)
Citation Context ...en the Bregman divergence is the squared loss, the corresponding problem is just the bounded least squares (BLS) problem given by min M:0≤Mih≤1 �X − MA�2 , for which there are well studied algorithms =-=[6]-=-. Now, from the real bounded matrix M, one can get the cluster membership by rounding Mih values either by proper thresholding [23] or randomized rounding. If k0 clusters get turned “on” for a particu... |

510 | Learning probabilistic relational models
- Getoor, Friedman, et al.
- 2001
(Show Context)
Citation Context ...oach to overlapping clustering introduced by Segal et al. [23], hereafter referred to as the SBK model. The original method was presented as a specialization of a Probabilistic Relational Model (PRM) =-=[14]-=- and was specifically designed for clustering gene expression data. We present an alternative view of their basic approach as a generalization of standard mixture models. While the original model maxi... |

454 | Mining generalized association rules
- Srikant, Agrawal
- 1995
(Show Context)
Citation Context ...sters, and the similarities of our formulation to the subset sum problem, we propose the algorithm dynamicM (Algorithm 1). The algorithm is motivated by the Apriori class of algorithms in data mining =-=[34]-=- and Shapley value computation in co-operative game theory [22, 14]. It is important to note that no theoretical claim is being made regarding the optimality of dynamicM. The belief is that such an ef... |

438 | Newsweeder: Learning to filter netnews
- Lang
- 1995
(Show Context)
Citation Context ...microarray gene expression data, it is appropriate to assign genes to multiple, overlapping clusters [33, 4]. In the popular 20-Newsgroups benchmark dataset used in text classification and clustering =-=[24]-=-, a fair number of the original articles were actually cross-posted to multiple newsgroups; the data was subsequently manipulated to produce disjoint categories. Ideally, a clustering algorithm applie... |

430 |
Introduction to Probability Models
- Ross
- 2000
(Show Context)
Citation Context ...s it can belong to is k, we first generate a n¢k binary membership matrix M from a Rayleigh distribution using rejection sampling. For each point, we first sample a value from a Rayleigh distribution =-=[32]-=- with a mean of 2. The actual number of processes p for the point is obtained by adding 1 to the sample value, so that the mean number of processes to which a point is assigned is effectively 3. Note ... |

381 | Biclustering of expression data
- Cheng, Church
- 2000
(Show Context)
Citation Context ...ng clustering has been primarily driven by the needs of microarray analysis. Several methods for obtaining overlapping gene clusters, including gene shaving [16] and mean square residue bi-clustering =-=[8]-=- have been proposed. Before the PRM based SBK model was proposed, one of the most notable efforts was the the plaid model [18], wherein the gene-expression matrix was modeled as a superposition of sev... |

310 | Clustering with bregman divergences
- Banerjee, Merugu, et al.
- 2005
(Show Context)
Citation Context ... generalize it to work with any regular exponential family distribution, and corresponding Bregman divergences, thereby making the model applicable for a wide variety of clustering distance functions =-=[2]-=-. This generalization is critical to the effective application of the approach to high-dimensional sparse data, such as typically those encountered in text mining and recommender systems, where Gaussi... |

304 | D.S.: Concept decomposition for large sparse text data using clustering
- Dhillon, Modha
- 2001
(Show Context)
Citation Context ...similar-3; (2) news-related-3; and (3) newsdifferent-3. Details of these datasets are outlined in [3]. The vectorspace model of each data subset was created using standard text pre-processing methods =-=[13]-=-, and each data subset has 300 points in high-dimensional space (> 1000 words). In this case, I-divergence was again used as the Bregman divergence for overlapping clustering, with suitable Laplace sm... |

275 | Biclustering Algorithms for Biological Data Analysis
- Madeira, Oliveira
- 2004
(Show Context)
Citation Context ...ring or co-clustering, i.e., simultaneous clustering of rows and columns, was suitable for such data sets since only certain groups of genes are co-expressed given a corresponding subset of conditions=-=[27]-=-. Several methods for obtaining overlapping gene clusters, including gene shaving [20] and mean square residue bi-clustering [10] have been proposed. Before the PRM based SBK model was proposed, one o... |

256 |
Parallel Optimization: Theory, Algorithms, and Applications
- Censor, Zenios
- 1997
(Show Context)
Citation Context ...was modeled as a superposition of several layers of plaids (subsets of genes and conditions). Bregman divergences were conceived and have been extensively studied in the convex optimization community =-=[7]-=-. Over the past few years, they have been successfully applied to a variety of maF-measure Precision Recall Data MOC Mixture MOC Mixture MOC Mixture small-synthetic 0.64 ± 0.12 0.36 ± 0.08 0.83 ± 0.0... |

204 | Logistic regression, AdaBoost and Bregman distances
- Collins, Schapire, et al.
- 2004
(Show Context)
Citation Context ... 0.04 Table 2: Results: dynamicM vs Bounded Least Squares (with search) for synthetic data chine learning issues, for example to unify seemingly disparate concepts of boosting and logistic regression =-=[11]-=-. More recently, they have been studied in the context of clustering [2]. Our formulation has some similarities to generalized linear models (GLMs) [21, 10]. However, there are a few very important di... |

188 | Probabilistic frame-based systems
- Koller, Pfeffer
- 1998
(Show Context)
Citation Context ... and whose entry in row i and column j is represented as Xi j or X j i . 2. BACKGROUND In this section, we give a brief introduction to the PRM-based SBK model. Probabilistic Relational Models (PRMs) =-=[18, 23]-=- extend the basic concepts of Bayesian networks into a framework for representing and reasoning with probabilistic relationships between entities in a relational structure. PRMs provide a very general... |

183 | A probabilistic framework for semi-supervised clustering
- Basu, Bilenko, et al.
(Show Context)
Citation Context ... the following three data subsets were created with varying levels of overlap in the topics: (1) news-similar-3; (2) news-related-3; and (3) newsdifferent-3. Details of these datasets are outlined in =-=[3]-=-. The vectorspace model of each data subset was created using standard text pre-processing methods [13], and each data subset has 300 points in high-dimensional space (> 1000 words). In this case, I-d... |

128 | Plaid models for gene expression data
- Lazzeroni, Owen
- 2002
(Show Context)
Citation Context ...lusters, including gene shaving [16] and mean square residue bi-clustering [8] have been proposed. Before the PRM based SBK model was proposed, one of the most notable efforts was the the plaid model =-=[18]-=-, wherein the gene-expression matrix was modeled as a superposition of several layers of plaids (subsets of genes and conditions). Bregman divergences were conceived and have been extensively studied ... |

115 | Gen shaving’ as a method for identifying distinct sets of genes with similar expression patterns
- Hastie, Tibshirani, et al.
- 2000
(Show Context)
Citation Context ...nted in [20]. Most recent work in overlapping clustering has been primarily driven by the needs of microarray analysis. Several methods for obtaining overlapping gene clusters, including gene shaving =-=[16]-=- and mean square residue bi-clustering [8] have been proposed. Before the PRM based SBK model was proposed, one of the most notable efforts was the the plaid model [18], wherein the gene-expression ma... |

110 |
Fuzzy models for pattern recognition
- Bezdek, Pal
- 1992
(Show Context)
Citation Context ...odel [23]. 6. RELATED WORK Possibility theory, developed in the fuzzy logic community, allows an object to “belong” to multiple sets in the sense of having high membership values to more than one set =-=[5]-=-. In particular, unlike probabilities, the sum of membership values may be more than one [22]. One of the earlier works on overlapping clustering techniques with the possibility of not clustering all ... |

108 | A generalizaionof principal component analysis to the exponential family
- Collins, Dasgupta, et al.
- 2001
(Show Context)
Citation Context ...te concepts of boosting and logistic regression [11]. More recently, they have been studied in the context of clustering [2]. Our formulation has some similarities to generalized linear models (GLMs) =-=[21, 10]-=-. However, there are a few very important differences. In GLMs [21], a multidimensional regression problem of the form dφ(Y, f (BZ)) is solved where Z is the (known) input variable, Y is the (known) r... |

98 | A generalized maximum entropy approach to bregman co-clustering and matrix approximation
- Banerjee, Dhillon, et al.
(Show Context)
Citation Context ...t the loss function is minimized. There can be two ways of coming up with an algorithm for updating M. The first one is to consider a real relaxation of the problem and allow M to take real values in =-=[0,1]-=-. For particular choices of the Bregman divergence, specific algorithms can be devised to solve the real relaxed version of the problem. For example, when the Bregman divergence is the squared loss, t... |

71 | Relative loss bounds for multidimensional regression problems
- Kivinen, Warmuth
- 2001
(Show Context)
Citation Context ...in the context of clustering [2]. Our formulation has some similarities to but a few very important differences with a large class of models studied in the context of generalized linear models (GLMs) =-=[29, 12, 19, 21]-=-. In GLMs [29], a multidimensional regression problem of the form dφ Y�f BZ is solved where Z is the (known) input variable, Y is the (known) response and f is the so-called canonical link function de... |

52 | Computing shapley values, manipulating value division schemes, and checking core membership in multi-issue domains
- Conitzer, Sandholm
- 2004
(Show Context)
Citation Context ...m problem, we propose the algorithm dynamicM (Algorithm 1). The algorithm is motivated by the Apriori class of algorithms in data mining [34] and Shapley value computation in co-operative game theory =-=[22, 14]-=-. It is important to note that no theoretical claim is being made regarding the optimality of dynamicM. The belief is that such an efficient algorithm will work well in practice, as the empirical evid... |

47 |
Problem decomposition and data reorganization by a clustering technique
- McCormick, Schweitzer, et al.
- 1972
(Show Context)
Citation Context ...probabilities, the sum of membership values may be more than one [22]. One of the earlier works on overlapping clustering techniques with the possibility of not clustering all points was presented in =-=[20]-=-. Most recent work in overlapping clustering has been primarily driven by the needs of microarray analysis. Several methods for obtaining overlapping gene clusters, including gene shaving [16] and mea... |

36 | Decomposing gene expression into cellular processes
- Segal, Battle, et al.
- 2003
(Show Context)
Citation Context ...usters. Several approaches to finding overlapping clusters have come from work on analysis of biological datasets. In this paper, we interpret an overlapping clustering model proposed by Segal et al. =-=[23]-=- as a generalization of Gaussian mixture models, and we extend it to an overlapping clustering model based on mixtures of any regular exponential family distribution and the corresponding Bregman dive... |

26 | On the Value of Private Information
- Kleinberg, Papadimitriou, et al.
- 2003
(Show Context)
Citation Context ...et sum problem, we propose the algorithm dynamicM (Algorithm 1). The algorithm is motivated by the Apriori class of algorithms in data mining and Shapley value computation in co-operative game theory =-=[17]-=-. It is important to note that no theoretical claim is being made regarding the optimality of dynamicM. The belief is that such an efficient algorithm will work well in practice, as the empirical evid... |

26 |
Randomized rounding
- Raghavanand, Thompson
- 1987
(Show Context)
Citation Context ...r which there are well studied algorithms [6]. Now, from the real bounded matrix M, one can get the cluster membership by rounding Mih values either by proper thresholding [33] or randomized rounding =-=[31]-=-. If k0 clusters get turned “on” for a particular data point, the SBK model performs an explicit 2 k0 search over the “on” clusters in order get improved results. Another alternative could be to keep ... |

25 | Information theoretic clustering of sparse co-occurrence data
- Dhillon, Guan
- 2003
(Show Context)
Citation Context ... number of points compared to the dimensionality of the space. Clustering a small number of points in a high-dimensional space is a comparatively difficult task, as observed by clustering researchers =-=[16]-=-. The purpose of performing experiments on these subsets is to scale down the sizes of the datasets for computational reasons but at the same time not scale down the difficulty of the tasks. 5.1.1 Syn... |

21 |
Generalized 2 linear 2 models
- Gordon
- 2003
(Show Context)
Citation Context ...he case where both B and Z are unknown and one alternates between updating B and Z has been studied by Collins et al. [10] while extending PCA to the exponential families. Although several extensions =-=[15]-=- of the basic GLM model to matrix factorization have been studied, except for the well known instance of non-negative matrix factorization (NMF) using I-divergence [19], all formulations use the canon... |

21 | Applying the Multiple Cause Mixture Model to Text Categorization
- Sahami, Hearst, et al.
- 1996
(Show Context)
Citation Context ...multiple, overlapping clusters [23, 4]. Similarly, when clustering documents into topic categories, documents may contain multiple relevant topics and an overlapping clustering might be more relevant =-=[22]-=-. In the 20-Newsgroups benchmark dataset, articles with multiple topics are cross posted to multiple newsgroups. Ideally, a clustering algorithm applied to this data would allow articles to be assigne... |

19 | D: Probabilistic discovery of overlapping cellular processes and their regulation
- Battle, Segal, et al.
(Show Context)
Citation Context ...biology, genes often simultaneously participate in multiple processes; therefore, when clustering micro-array gene expression data, it is appropriate to assign genes to multiple, overlapping clusters =-=[23, 4]-=-. Similarly, when clustering documents into topic categories, documents may contain multiple relevant topics and an overlapping clustering might be more relevant [22]. In the 20-Newsgroups benchmark d... |

14 |
knapsack problems
- Hard
- 1980
(Show Context)
Citation Context ...r a given problem. So, we focus on designing an efficient way of searching through the relevant possibilities using the second observation. The subset sum problem is one of the hard knapsack problems =-=[9]-=- that tries to solve the following: Given a set of k natural numbers a1,...,ak and a target number x, find a subset S of the numbers such that ∑ah∈S ah = x. In a more realistic setting, one works with... |

13 |
Proximity function minimization using multiple Bregman projections, with applications to split feasibility and Kullback-Leibler distance minimization
- Byrne, Censor
(Show Context)
Citation Context ...case of I-divergence or un-normalized relative entropy, the problem min A dI X�MA�min A ∑ i�j�Xi j log Xi j MA i j Xi j MA i j�� (10) has been studied as a non-negative matrix factorization technique =-=[7, 26]-=-. The optimal update for A for given X�M is multiplicative and is given by A j h�A j ∑i M h h j i Xi�MA j i ∑i Mh i (11) In order to prevent a divide by 0, it makes sense to use max MA j i�ε and max ∑... |

12 | The multiple subset sum problem
- Caprara, Kellerer, et al.
- 2000
(Show Context)
Citation Context ...lar, then the problem is to find M£i such that M£i�argmin dφ Xi�MiA�argmin Mi�0�1�k Mi�0�1�k m ∑ j�1 dφ Xi j�k ∑ M h�1 h j i A 1The problem is different from the so-called multiple subset sum problem =-=[8]-=-. h� Thus, there are m targets Xi1�����Xim, and for each target Xi j the subset is to be chosen from A1 j�����A k j . The total loss is the sum of the individual losses, and the problem is to find a s... |

4 |
EachMovie collaborative filtering data set
- McJonese
- 1997
(Show Context)
Citation Context ...lied to this data would allow articles to be assigned to multiple newsgroups and would rediscover the original cross-posted articles. In the popular EachMovie dataset used to test recommender systems =-=[30]-=-, many movies belong to more than one genre, such as “Aliens”, which is listed in the action, horror and science fiction genres. An overlapping clustering algorithm applied to this data should automat... |

3 |
A re-examination of text cateogrization methods
- Yang, Liu
- 1999
(Show Context)
Citation Context ...the class with the highest aposteriori probability. For example, when classifying documents from the Reuters data set version 3 using k-nearest neighbor, a relatively high value of k=45 was chosen in =-=[35]-=-. A document was assigned to every class for which the weighted sum of the neighbors belonging to that class exceeded an empirically determined threshold. Note that the weighted sum is proportional to... |