Clustering in a High-Dimensional Space Using Hypergraph Models
Cached
Download Links
| Citations: | 16 - 3 self |
BibTeX
@MISC{Han_clusteringin,
author = {Eui-Hong (Sam) Han and George Karypis and Vipin Kumar and Bamshad Mobasher},
title = {Clustering in a High-Dimensional Space Using Hypergraph Models},
year = {}
}
OpenURL
Abstract
Clustering of data in a large dimension space is of a great interest in many data mining applications. Most of the traditional algorithms such as K-means or AutoClass fail to produce meaningful clusters in such data sets even when they are used with well known dimensionality reduction techniques such as Principal Component Analysis and Latent Semantic Indexing. In this paper, we propose a method for clustering of data in a high dimensional space based on a hypergraph model. The hypergraph model maps the relationship present in the original data in high dimensional space into a hypergraph. A hyperedge represents a relationship (affinity) among subsets of data and the weight of the hyperedge reflects the strength of this affinity. A hypergraph partitioning algorithm is used to find a partitioning of the vertices such that the corresponding data items in each partition are highly related and the weight of the hyperedges cut by the partitioning is minimized. We present results of experiments on three different data sets: S&P500 stock data for the period of 1994-1996, protein coding data, and Web document data. Wherever applicable, we compared our results with those of AutoClass and K-means clustering algorithm on original data as well as on the reduced dimensionality data obtained via Principal Component Analysis or Latent Semantic Indexing scheme. These experiments demonstrate that our approach is applicable and effective in a wide range







