### Citations

12169 |
Elements of information theory
- Cover, Thomas
- 2006
(Show Context)
Citation Context ...ation [8, 4], i.e. to minimize Á �� �sÁ �� � � � �� �Ý Ý �Ý Ô Ý ÃÄ Ô ��Ý �Ô ���Ý � (1) where Á �� � is mutual information between random variable � and � and ÃÄ stands for Kullback-Leibler divergence =-=[1]-=-. The above expression for the loss in mutual information suggests a “natural” divisive clustering algorithm (DITC), which iteratively (i) re-partitions the distributions Ô ��Ý by their closeness in K... |

317 | Deterministic annealing for clustering, compression, classi regression and related optimization problems
- Rose
- 1998
(Show Context)
Citation Context ... DITC priorissameasDITC except that the cluster distributions are computed as in (2) and « is halved at every iteration. Our prior has the same influence as the temperature in deterministic annealing =-=[7]-=- through a slightly different mechanism: when the prior is big all the Ô ���Ý ’s are uniform, i.e., the joint entropy À �� � � is large, thus ÃÄ Ô ��Ý �Ô ���Ý is almost the same for all Ý and �Ý. As t... |

176 | Document clustering using word clusters via the information bottleneck method
- Slonim, Tishby
- 2000
(Show Context)
Citation Context ... partitions before invoking the slower local search procedure; hence DITC PLS is our method of choice. We now compare our Algorithm DITC PLS with previously proposed information-theoretic algorithms. =-=[9]-=- proposed the use of an agglomerative algorithm that first clusters words, and then uses this clustered feature space to cluster documents using the same agglomerative information bottleneck method. M... |

134 | A divisive information-theoretic feature clustering algorithm for text classification
- Dhillon, Mallela, et al.
- 2003
(Show Context)
Citation Context ... where non-negative co-occurrence data is available. A novel formulation poses the clustering problem as one in information theory: find the clustering that minimizes the loss in (mutual) information =-=[8, 4]-=-. This information-theoretic formulation leads to a “natural” divisive clustering algorithm that uses relative entropy as the measure of similarity and monotonically reduces the loss in mutual informa... |

122 | Unsupervised document classification using sequential information maximization
- Slonim, Friedman, et al.
- 2002
(Show Context)
Citation Context ... where non-negative co-occurrence data is available. A novel formulation poses the clustering problem as one in information theory: find the clustering that minimizes the loss in (mutual) information =-=[8, 4]-=-. This information-theoretic formulation leads to a “natural” divisive clustering algorithm that uses relative entropy as the measure of similarity and monotonically reduces the loss in mutual informa... |

62 | Iterative clustering of high dimensional text data augmented by local search
- Dhillon, Guan, et al.
- 2002
(Show Context)
Citation Context ...hus ÃÄ Ô ��Ý �Ô ���Ý is almost the same for all Ý and �Ý. As the prior decreases À �� � � is decreased. To further improve our algorithm, we turn to a local search strategy, called first variation in =-=[3]-=-, that allows us to escape undesirable local minimum, especially in the case of high-dimensionality. Precisely, a first variation of a partition ��Ý � � � �� is a partition ��Ý � �� �� obtained by rem... |

46 |
News weeder: Learning to filter netnews
- Lang
- 1995
(Show Context)
Citation Context ...information-theoretic algorithm applied to the task of clustering document collections using word-document co-occurrence data. For our test data, we use various subsets of the - newsgroup data (NG20) =-=[6]-=- and the SMART collection (ftp://ftp.cs.cornell.edu/pub/smart). NG20 consists of approximately 20,000 newsgroup postings collected from different usenet newsgroups. We report results on NG20 and vario... |

34 | Information theoretic clustering of sparse co-occurrence data
- Dhillon, Guan
- 2003
(Show Context)
Citation Context ...earch, i.e., it iteratively runs DITC prior and a chain of first variations till it converges. Lack of space prevents us from giving a more detailed description of the algorithm which may be found in =-=[2]-=-.sMED CRAN CISI �Ý 847 41 275 �Ý 142 954 86 �Ý 44 405 1099 MED CRAN CISI �Ý 1016 1 2 �Ý 1 1389 1 �Ý 16 9 1457 DITC results DITC prior results Table 1. Confusion matrices for �� documents, � words (CLA... |