## Data clustering by Markovian relaxation and the Information Bottleneck Method (2000)

### Cached

### Download Links

- [www.cs.huji.ac.il]
- [www.cs.cmu.edu]
- [www.cs.huji.ac.il]
- DBLP

### Other Repositories/Bibliography

Citations: | 68 - 8 self |

### BibTeX

@INPROCEEDINGS{Tishby00dataclustering,

author = {Naftali Tishby and Noam Slonim},

title = {Data clustering by Markovian relaxation and the Information Bottleneck Method},

booktitle = {},

year = {2000},

pages = {640--646},

publisher = {MIT Press}

}

### Years of Citing Articles

### OpenURL

### Abstract

We introduce a new, non-parametric and principled, distance based clustering method. This method combines a pairwise based approach with a vector-quantization method which provide a meaningful interpretation to the resulting clusters. The idea is based on turning the distance matrix into a Markov process and then examine the decay of mutual-information during the relaxation of this process. The clusters emerge as quasi-stable structures during this relaxation, and then are extracted using the information bottleneck method. These clusters capture the information about the initial point of the relaxation in the most effective way. The method can cluster data with no geometric or other bias and makes no assumption about the underlying distribution.

### Citations

8593 |
Elements of Information Theory
- Cover, Thomas
- 1991
(Show Context)
Citation Context ... DKL [P t i;j kp t i ] ; (4) where p j is the prior probability of the states, and p t i = P j p t i;j p j is the unconditioned probability of x i at time t. The DKL is the Kulback-Liebler divergence =-=[4]-=-, defined as: DKL [pkq] j P y p(y) log p(y) q(y) which is the information theoretic measure of similarity of distributions. Since all the rows p t i;j relax tosthis divergence goes to zero as t ! 1. W... |

970 |
The use of multiple measurements in taxonomic problems
- FISHER
- 1936
(Show Context)
Citation Context ... identify the information preserving clusters. 5 More examples We applied our method to several `standard' clustering problems and obtained very good results. The first one was the famous "iris d=-=ata" [7]-=-, on which we easily obtained just 5 misclassified points. A more interesting application was obtained on well known gene expression data, the Colon cancer data set provided by Alon et. al [1].This da... |

731 |
Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays
- Alon, Barkai, et al.
- 1999
(Show Context)
Citation Context ...s data" [7], on which we easily obtained just 5 misclassified points. A more interesting application was obtained on well known gene expression data, the Colon cancer data set provided by Alon et=-=. al [1].This data-=- set consists ofstissue samples out of which 22 came from tumors and the rest from "normal" biopsies of colon parts of the same patients. Gene expression levels were given for 2000 genes (ol... |

548 | Distributional clustering of english words
- Pereira, Tishby, et al.
- 1993
(Show Context)
Citation Context ...ve information The problem of self-organization of the members of a set X based on the similarity of the conditional distributions of the members of another set, Y , fp(yjx)g, was first introduced in =-=[9] and was t-=-ermed "distributional clustering". This question was recently shown in [12] to be a specific case of a much more fundamental problem: What are the features of the variable X that are relevan... |

437 | The information bottleneck method
- Tishby, Pereira, et al.
- 1999
(Show Context)
Citation Context ...ter, that enables prediction of the location on the graph at time t, with similar accuracy? The answer to this question is naturally provided via the recently introduced information bottleneck method =-=[12, 11]-=-. 3 Clusters that preserve information The problem of self-organization of the members of a set X based on the similarity of the conditional distributions of the members of another set, Y , fp(yjx)g, ... |

248 | Deterministic Annealing for Clustering, Compression, Classiation, Regression, and Related Optimization Problems
- Rose
- 1998
(Show Context)
Citation Context ...n Bottleneck Method. The original approach to the solution of the resulting equations, used already in [9], was based on an analogy with the "deterministic annealing" (DA) approach to cluste=-=ring (see [10, 8]). This is-=- a top-down hierarchical algorithm that starts from a single cluster and undergoes a cascade of cluster splits which are determined stochastically (as phase transitions) into a "soft" (fuzzy... |

198 | Pairwise data clustering by deterministic annealing
- Hofmann, Buhmann
- 1997
(Show Context)
Citation Context ...n Bottleneck Method. The original approach to the solution of the resulting equations, used already in [9], was based on an analogy with the "deterministic annealing" (DA) approach to cluste=-=ring (see [10, 8]). This is-=- a top-down hierarchical algorithm that starts from a single cluster and undergoes a cascade of cluster splits which are determined stochastically (as phase transitions) into a "soft" (fuzzy... |

188 | Tissue classification with gene expression profiles - Ben-Dor, Bruhn, et al. - 2000 |

154 | Agglomerative information bottleneck
- Slonim, Tishby
- 1999
(Show Context)
Citation Context ...ter, that enables prediction of the location on the graph at time t, with similar accuracy? The answer to this question is naturally provided via the recently introduced information bottleneck method =-=[12, 11]-=-. 3 Clusters that preserve information The problem of self-organization of the members of a set X based on the similarity of the conditional distributions of the members of another set, Y , fp(yjx)g, ... |

57 | Data clustering using a model granular magnet. Neural Computation, 9:1805– 1842
- Blatt, Wiseman, et al.
- 1997
(Show Context)
Citation Context ...ion and sample to sample fluctuations is not well defined. Algorithms that use only the pairwise distances, without explicit use of the distance measure itself, employ statistical mechanics analogies =-=[3]-=- or collective graph theoretical properties [6], etc. The points are then grouped based on some global criteria, such as connected components, small cuts, or minimum alignment energy. Such algorithms ... |

34 |
A randomized algorithm for pairwise clustering
- Gdalyahu, Weinshall, et al.
- 1998
(Show Context)
Citation Context ...ll defined. Algorithms that use only the pairwise distances, without explicit use of the distance measure itself, employ statistical mechanics analogies [3] or collective graph theoretical properties =-=[6]-=-, etc. The points are then grouped based on some global criteria, such as connected components, small cuts, or minimum alignment energy. Such algorithms are sometimes computationally inefficient and i... |

20 |
Clustering analysis and display of genome wide expression patterns
- Eissen, Spellman, et al.
- 1998
(Show Context)
Citation Context ...ression levels were given for 2000 genes (oligonucleotides), resulting with a 62 over 2000 matrix. As done in other studies of this data, we calculated the Pearson correlation, K p (u; v) (see, e.g., =-=[5]-=-), between the u and v expression rows and then transforemed this measure to distances through the simple transformation defined by d(u; v) = 1\GammaK p (u;v) 1+Kp (u;v) . In figure 1 (right panel) we... |