## CLOPE: A Fast and Effective Clustering Algorithm for Transactional Data (2002)

### Cached

### Download Links

- [www.cse.unsw.edu.au]
- [centria.di.fct.unl.pt]
- [elvis.slis.indiana.edu]
- [www.inf.ufrgs.br]
- DBLP

### Other Repositories/Bibliography

Venue: | In: Proc of KDD’02 |

Citations: | 22 - 2 self |

### BibTeX

@INPROCEEDINGS{Yang02clope:a,

author = {Yiling Yang and Xudong Guan and Jinyuan You},

title = {CLOPE: A Fast and Effective Clustering Algorithm for Transactional Data},

booktitle = {In: Proc of KDD’02},

year = {2002},

pages = {682--687}

}

### Years of Citing Articles

### OpenURL

### Abstract

This paper studies the problem of categorical data clustering, especially for transactional data characterized by high dimensionality and large volume. Starting from a heuristic method of increasing the height-to-width ratio of the cluster histogram, we develop a novel algorithm -- CLOPE, which is very fast and scalable, while being quite effective. We demonstrate the performance of our algorithm on two real world datasets, and compare CLOPE with the state-of-art algorithms.

### Citations

2447 | Mining association rules between sets of items in large databases
- Agrawal, Imieliński, et al.
- 1993
(Show Context)
Citation Context ...n words with respect to their frequencies. See [15] for some common approaches in document clustering. Also, there are some similarities between transactional data clustering and association analysis =-=[2]-=-. Both of these two popular data mining techniques can reveal some interesting properties of item co-occurrence and relationship in transactional databases. Moreover, current approaches [9] for associ... |

1871 | Some methods of classification and analysis of multivariate observations
- MacQueen
- 1967
(Show Context)
Citation Context ...clustering of transactional databases is extremely difficult because of the high dimensionality, sparsity, and huge volumes often characterizing these databases. Distancebased approaches like k-means =-=[11]-=- and CLARANS [12] are effective for low dimensional numerical data. Their performances on high dimensional categorical data, however, are often unsatisfactory [7]. Hierarchical clustering methods like... |

1154 | R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach
- Han, Pei, et al.
- 2004
(Show Context)
Citation Context ...n analysis [2]. Both of these two popular data mining techniques can reveal some interesting properties of item co-occurrence and relationship in transactional databases. Moreover, current approaches =-=[9]-=- for association analysis needs only very few scans of the database. However, there are differences. On the one hand, clustering can give a general overview property of the data, while association ana... |

1106 | A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise
- Ester
- 1996
(Show Context)
Citation Context ...the state-of-art algorithms. Keywords data mining, clustering, categorical data, scalability 1. INTRODUCTION Clustering is an important data mining technique that groups together similar data records =-=[12, 14, 4, 1]-=-. Recently, more attention has been put on clustering categorical data [10, 8, 6, 5, 7, 13], where records are made up of non-numerical attributes. Transactional data, like market basket data and web ... |

597 | Efficient and effective clustering methods for spatial data mining
- Ng, Han
- 1994
(Show Context)
Citation Context ...the state-of-art algorithms. Keywords data mining, clustering, categorical data, scalability 1. INTRODUCTION Clustering is an important data mining technique that groups together similar data records =-=[12, 14, 4, 1]-=-. Recently, more attention has been put on clustering categorical data [10, 8, 6, 5, 7, 13], where records are made up of non-numerical attributes. Transactional data, like market basket data and web ... |

564 | RAGHAVAN: Automatic subspace clustering of high dimensional data for data mining applications
- AGARWAL, GEHRKE, et al.
(Show Context)
Citation Context ...the state-of-art algorithms. Keywords data mining, clustering, categorical data, scalability 1. INTRODUCTION Clustering is an important data mining technique that groups together similar data records =-=[12, 14, 4, 1]-=-. Recently, more attention has been put on clustering categorical data [10, 8, 6, 5, 7, 13], where records are made up of non-numerical attributes. Transactional data, like market basket data and web ... |

440 | Data Preparation for Mining World Wide Web Browsing Patterns
- Cooley, Mobasher, et al.
- 1999
(Show Context)
Citation Context ...s.berkeley.edu/logs/ as the dataset for our second experiment and test the scalability as well as performance of CLOPE. We use the web logs of November 2001 and preprocess it with methods proposed in =-=[3]-=-. There are about 7 million entries in the raw log file and 2 million of them are kept after non-html 3 entries removed. Among these 2 million entries, there are a total of 93,665 distinct pages. The ... |

338 | K.: ROCK: A Robust Clustering Algorithm for Categorical Attributes
- Guha, Rastogi, et al.
- 2000
(Show Context)
Citation Context ...ability 1. INTRODUCTION Clustering is an important data mining technique that groups together similar data records [12, 14, 4, 1]. Recently, more attention has been put on clustering categorical data =-=[10, 8, 6, 5, 7, 13]-=-, where records are made up of non-numerical attributes. Transactional data, like market basket data and web usage data, can be thought of a special type of categorical data having boolean value, with... |

216 |
an efficient data clustering method for very large databases
- Zhang, Ramakrishnan, et al.
- 1996
(Show Context)
Citation Context ...S=22243 /, occ=19517 /Students/Classes, occ=2726 * number after page name is the occurrence in the cluster 5. RELATED WORK There are many works on clustering large databases, e.g. CLARANS [12], BIRCH =-=[14]-=-, DBSCAN [4], CLIQUE [1]. Most of them are designed for low dimensional numerical data, exceptions are CLIQUE which finds dense subspaces in higher dimensions. Recently, many works on clustering large... |

150 | Criterion functions for document clustering: experiments and analysis
- Zhao, Karypis
- 2002
(Show Context)
Citation Context ... words in it. Clustering is carried out also by optimizing a certain criterion function. However, document clustering tends to assume different weights on words with respect to their frequencies. See =-=[15]-=- for some common approaches in document clustering. Also, there are some similarities between transactional data clustering and association analysis [2]. Both of these two popular data mining techniqu... |

144 | Clustering categorical data: An approach based on dynamical systems
- Gibson, Kleinberg, et al.
- 1998
(Show Context)
Citation Context ...ability 1. INTRODUCTION Clustering is an important data mining technique that groups together similar data records [12, 14, 4, 1]. Recently, more attention has been put on clustering categorical data =-=[10, 8, 6, 5, 7, 13]-=-, where records are made up of non-numerical attributes. Transactional data, like market basket data and web usage data, can be thought of a special type of categorical data having boolean value, with... |

93 | CACTUS: Clustering categorical data using summaries
- Ganti, Gehrke, et al.
- 1999
(Show Context)
Citation Context ...ability 1. INTRODUCTION Clustering is an important data mining technique that groups together similar data records [12, 14, 4, 1]. Recently, more attention has been put on clustering categorical data =-=[10, 8, 6, 5, 7, 13]-=-, where records are made up of non-numerical attributes. Transactional data, like market basket data and web usage data, can be thought of a special type of categorical data having boolean value, with... |

88 | Clustering based on association rule hypergraphs
- Han, Karypis, et al.
- 1997
(Show Context)
Citation Context ...mal distance to all the points. The distance in k-modes is measured by number of common categorical attributes shared by two points, with optional weights among different attribute values. Han et.al. =-=[8]-=- use association rule hypergraph partitioning to cluster items in large transactional database. STIRR [6] and CACTUS [5] also model categorical clustering as a hypergraph-partitioning problem, but the... |

87 | A Fast Clustering Algorithm to Cluster Very Large Categorical Data
- Huang
(Show Context)
Citation Context |

62 | Clustering transactions using large items
- Wang, Xu, et al.
- 1999
(Show Context)
Citation Context |