## Entropy-Based Criterion in Categorical Clustering (2004)

### Cached

### Download Links

Venue: | Proc. of Intl. Conf. on Machine Learning (ICML |

Citations: | 23 - 3 self |

### BibTeX

@INPROCEEDINGS{Li04entropy-basedcriterion,

author = {Tao Li and Sheng Ma and Mitsunori Ogihara},

title = {Entropy-Based Criterion in Categorical Clustering},

booktitle = {Proc. of Intl. Conf. on Machine Learning (ICML},

year = {2004},

pages = {536--543}

}

### Years of Citing Articles

### OpenURL

### Abstract

Entropy-type measures for the heterogeneity of clusters have been used for a long time. This paper studies the entropy-based criterion in clustering categorical data. It first shows that the entropy-based criterion can be derived in the formal framework of probabilistic clustering models and establishes the connection between the criterion and the approach based on dissimilarity coefficients.

### Citations

8892 |
Elements of Information Theory
- Cover, Thomas
- 1991
(Show Context)
Citation Context ...ropy-based criterion and other criteria. The relations of the entropy-based criterion to Minimum Description Length (MDL) / Minimum Message Length (MML) and to rate distortion theory can be found in (=-=Cover & Thomas, 1991-=-; Baxter & Oliver, 1994). MDL/MML Rate Distoration Code Length Average Distoration Entropy Criterion Order Relationships Generalized Entropy Mutual Information between Unconditional Density and Partit... |

444 |
Simulation and the Monte Carlo Method
- Rubinstein
- 1981
(Show Context)
Citation Context .... We then perform an iterative Monte-Carlo process to find the optimal partition. The clustering procedure is the following Algorithm 1. Algorithm 1 uses a Monte-Carlo method to perform optimization (=-=Rubinstein, 1981)-=-. Randomly picking a data point x and putting it into another cluster is a trial step of modifying the parameters θ(j|k). We then check whether the entropy criterion is decreased, and if so, we accep... |

351 | Rock: A robust clustering algorithm for categorical attributes
- GUHA, RASTOGI, et al.
- 1999
(Show Context)
Citation Context ...igin and the color of eyes in demographic data. Many algorithms have been developed for clustering categorical data, e.g., (Barbara et al., 2002; Gibson et al., 1998; Huang, 1998; Ganti et al., 1999; =-=Guha et al., 2000-=-; Gyllenberg et al., 1997). Entropy-type measures for similarity among objects have been used from early on. In this paper, we show that the entropy-based clustering criterion can be formally derived ... |

256 |
Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering
- McCallum
- 1996
(Show Context)
Citation Context ...anization of the posted article are ignored. In all our experiments, we first select the top 200 words by mutual information with class labels. The feature selection is done with the rainbow package (=-=McCallum, 1996-=-). In our experiments, we compare the performance of our entropy-based method with the popular vector space variant of the partitioning algorithms provided in the CLUTO package (Zhao & Karypis, 2001).... |

166 | Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery
- Huang
- 1998
(Show Context)
Citation Context ...of such include the country of origin and the color of eyes in demographic data. Many algorithms have been developed for clustering categorical data, e.g., (Barbara et al., 2002; Gibson et al., 1998; =-=Huang, 1998-=-; Ganti et al., 1999; Guha et al., 2000; Gyllenberg et al., 1997). Entropy-type measures for similarity among objects have been used from early on. In this paper, we show that the entropy-based cluste... |

148 | Clustering categorical data: An approach based on dynamical systems
- Gibson, Kleinberg, et al.
(Show Context)
Citation Context ... numerical. Examples of such include the country of origin and the color of eyes in demographic data. Many algorithms have been developed for clustering categorical data, e.g., (Barbara et al., 2002; =-=Gibson et al., 1998-=-; Huang, 1998; Ganti et al., 1999; Guha et al., 2000; Gyllenberg et al., 1997). Entropy-type measures for similarity among objects have been used from early on. In this paper, we show that the entropy... |

95 | CACTUS – Clustering Categorical Data Using Summaries
- Ganti, Gehrke, et al.
- 1999
(Show Context)
Citation Context ...de the country of origin and the color of eyes in demographic data. Many algorithms have been developed for clustering categorical data, e.g., (Barbara et al., 2002; Gibson et al., 1998; Huang, 1998; =-=Ganti et al., 1999-=-; Guha et al., 2000; Gyllenberg et al., 1997). Entropy-type measures for similarity among objects have been used from early on. In this paper, we show that the entropy-based clustering criterion can b... |

90 | Quantification method of classification processes: Concept of structural α-entropy
- Havrda, Charvát
- 1967
(Show Context)
Citation Context ... xi ′ ,j| nkρ (j) k nk(1 − ρ (j) k ) nkρ (j) k (1 − ρ(j) k ). Here for each k, 1 ≤ k ≤ K, and for each j, 1 ≤ j ≤ r, ρ (j) k is the probability that the j-th attribute is 1 in Ck. Havrda and Charvat (=-=Havrda & Charvat, 1967-=-) proposed a generalized entropy of degree s, s > 0 and s �= 1, for a discrete probability distribution Q = (q1, q2, . . . , qn): H s (Q) = (2 (1−s) − 1) −1 � n� q s � i − 1 . It holds that i=1slims→1... |

82 |
Mathematical Taxonomy
- Jardine, Sibson
- 1971
(Show Context)
Citation Context ...+βd , where α > 0 and β ≥ 0. Dissimilarity measures can be transformed into a similarity function by simple transformations such as adding 1 and inverting, dividing by 2 and subtracting from 1, etc. (=-=Jardine & Sibson, 1971-=-). If the joint absence of the attribute is ignored, i.e., β is set to 0, then the binary dissimilarity measure can be generally written as D(a, b, c, d) = b+c αa+b+c , where α > 0. Table 1 shows seve... |

70 | COOLCAT: an entropy-based algorithm for categorical clustering
- Barbará, Li, et al.
- 2002
(Show Context)
Citation Context ... many of which are not numerical. Examples of such include the country of origin and the color of eyes in demographic data. Many algorithms have been developed for clustering categorical data, e.g., (=-=Barbara et al., 2002-=-; Gibson et al., 1998; Huang, 1998; Ganti et al., 1999; Guha et al., 2000; Gyllenberg et al., 1997). Entropy-type measures for similarity among objects have been used from early on. In this paper, we ... |

38 |
Clustering Criteria and Multivariate Normal Mixtures
- SYMONS
- 1981
(Show Context)
Citation Context ...oduce auxiliary vectors, ui = (ui,k), 1 ≤ i ≤ n, 1 ≤ k ≤ K, where ui,k = 1 if and only if xi comes from the cluster Ck. These vectors are additional unknown parameters. The classification like=-=lihood (Symons, 1981), denoted by CL(a, u)-=-, is equal to: n� i=1 k=1 = K� ui,k log p(xi|ak) n� K� i=1 k=1 It is easy to see that where LP (a, u) = − ui,k log r� � j=1 a (j) k � xi,j � 1 − a (j) k CL(a, u) = L(a) − LP (a, ... |

19 |
Clustering criteria for discrete data and latent class models
- Celeux, Govaert
- 1991
(Show Context)
Citation Context ...r� � p(yi = t) log p(yi = t) i=1 t∈Vi We will use ˆ H for the estimated entropy of the partition. 3. Classical Entropy Criterion 3.1. Entropy Criterion The classical clustering criterion (Bock, 1989; =-=Celeux & Govaert, 1991-=-) searches for a partition C that maximizes the following quantity O(C): O(C) = = K� r� 1� k=1 j=1 t=0 K� = 1 N r� 1� k=1 j=1 t=0 K� r� k=1 j=1 t=0 Nj,k,t N Nj,k,t N 1� log NNj,k,t NkNj,t � log Nj,k,t... |

17 |
Classification of binary vectors by stochastic complexity
- Gyllenberg, Koski, et al.
- 1997
(Show Context)
Citation Context ...of eyes in demographic data. Many algorithms have been developed for clustering categorical data, e.g., (Barbara et al., 2002; Gibson et al., 1998; Huang, 1998; Ganti et al., 1999; Guha et al., 2000; =-=Gyllenberg et al., 1997-=-). Entropy-type measures for similarity among objects have been used from early on. In this paper, we show that the entropy-based clustering criterion can be formally derived in the framework of proba... |

13 |
Probabilistic aspects in cluster analysis,” Conceptual and Numerical Analysis of Data
- Bock
(Show Context)
Citation Context ... of the distances of the points to their cluster centroids). However, if the data vectors contain categorical variables, geometric approaches are inappropriate and other strategies must be developed (=-=Bock, 1989-=-). Appearing in Proceedings of the 21 st International Conference on Machine Learning, Banff, Canada, 2004. Copyright by the authors. The problem of clustering becomes more challenging when the data i... |

13 |
Finding natural clusters through entropy minimization
- Wallace
- 1989
(Show Context)
Citation Context ... Disimilarity Coefficients KL Measure Classification Likelihood Penality of Partition Maximum Likelihood Figure 1. A summary of relations among various clustering criteria. We note here that Wallace (=-=Wallace, 1989-=-) proposed a twostep procedure for numerical hierarchical cluster analysis by minimizing Gaussian entropy, defined based on the logarithm of the covariance matrix determinant. The relationships betwee... |

12 | Maximum certainty data partitioning
- Roberts, Everson, et al.
- 2000
(Show Context)
Citation Context ...observed dataset is generated by a number of classes. We first model the unconditional probability density function and then seek a number of partitions whose combination yields the density function (=-=Roberts et al., 2000-=-). The K–L measure then tries to measure the difference between the unconditional density and the density under partition. Let p(y) and q(y) be two distributions. Then � � � p(y) KL(p(y) � q(y)) = p(y... |

10 |
Two variant axiom systems for presence/absence based dissimilarity coefficients
- Baulieu
- 1997
(Show Context)
Citation Context ... are popular measures of the distances. 5.1. Dissimilarity Coefficients Given two data points, w and w ′ , there are four fundamental quantities that can be used to define similarity between the two=-= (Baulieu, 1997): a = �{j | wj = w ′ j = 1}-=-�, b = �{j | wj = 1 ∧ w ′ j = 0}�, c = �{j | wj = 0 ∧ w ′ j = 1}�, and d = �{j | wj = w ′ j = 0}�, where 1 ≤ j ≤ r. It has been shown in (Baulieu, 1997) that the presence/a... |

10 | Efficient multi-way text categorization via generalized discriminant analysis
- Li, Zhu, et al.
- 2003
(Show Context)
Citation Context ... technical reports published in the Department of Computer Science at the University of Rochester between 1991 and 2002. The TRs are available at http://www.cs.rochester.edu/trs. It has been used in (=-=Li et al., 2003-=-) for text categorization. The dataset contained 476 abstracts, which were divided into four research areas: Natural Language Processing(NLP), Robotics/Vision, Systems, and Theory. WebKB: The WebKB da... |