Results 1 -
5 of
5
On the Impact of Dissimilarity Measure in k-Modes Clustering Algorithm
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2007
"... This correspondence describes extensions to the k-modes algorithm for clustering cat-egorical data. By modifying a simple matching dissimilarity measure for categorical ob-jects, a heuristic approach was developed in [4, 12], which allows the use of the k-modes paradigm to obtain a cluster with stro ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This correspondence describes extensions to the k-modes algorithm for clustering cat-egorical data. By modifying a simple matching dissimilarity measure for categorical ob-jects, a heuristic approach was developed in [4, 12], which allows the use of the k-modes paradigm to obtain a cluster with strong intra-similarity, and to efficiently cluster large categorical data sets. The main aim of this paper is to derive rigorously the updating formula of the k-modes clustering algorithm with the new dissimilarity measure, and the convergence of the algorithm under the optimization framework. Index Terms – Data mining, clustering, k-modes algorithm, categorical data. 1
An Improved K-means Algorithm for Clustering Categorical Data *
"... Abstract: Most of the earlier work on clustering is mainly focused on numerical data the inherent geometric properties of which can be exploited to naturally define distance functions between the data points. However, the computational cost makes most of the previous algorithms unacceptable for clus ..."
Abstract
- Add to MetaCart
Abstract: Most of the earlier work on clustering is mainly focused on numerical data the inherent geometric properties of which can be exploited to naturally define distance functions between the data points. However, the computational cost makes most of the previous algorithms unacceptable for clustering very large databases. The k-means algorithm is well known for its efficiency in this respect. At the same time, working only on numerical data prohibits them from being used for clustering categorical data. This paper shows how to apply the notion of “cluster centers ” to a dataset of categorical objects, and a k-means-like algorithm for clustering categorical data is introduced. *
Grid-based Knowledge Discovery in
, 2006
"... Knowledge discovery in clinico-genomic data is a task that requires to integrate not only highly heterogeneous kinds of data, but also the requirements and interests of very di#erent user groups. Technologies of grid computing promise to be an e#ective tool to combine all these requirements into ..."
Abstract
- Add to MetaCart
Knowledge discovery in clinico-genomic data is a task that requires to integrate not only highly heterogeneous kinds of data, but also the requirements and interests of very di#erent user groups. Technologies of grid computing promise to be an e#ective tool to combine all these requirements into a single architecture. In this paper, we describe scenarios and future research directions related to grid-based knowledge discovery in clinico-genomic data, and introduce the approach taken by the recently launched ACGT project . The whole endeavor is considered in the context of biomedical informatics research and aims towards the realization of an integrated and grid-enabled biomedical infrastructure. The presented integrated clinico-genomics knowledge discovery (ICGKD) scenario and its process realization is based on a multi-strategy data-mining approach that seamlessly integrates three distinct data-mining components: clustering, association rules mining, and feature-selection. Preliminary experimental results are indicative of the rational and reliability of the approach.
Modularity and Spectral Co-Clustering for Categorical Data
"... Abstract — To tackle the co-clustering problem on categorical data, we consider a spectral approach. We first define a generalized modularity measure for the co-clustering task. Then, we reformulate its maximization as a trace maximization problem. Finally we develop a spectral based co-clustering a ..."
Abstract
- Add to MetaCart
Abstract — To tackle the co-clustering problem on categorical data, we consider a spectral approach. We first define a generalized modularity measure for the co-clustering task. Then, we reformulate its maximization as a trace maximization problem. Finally we develop a spectral based co-clustering algorithm performing this maximization. The proposed algorithm is then capable to cluster rows and colunms simultaneously. Experimental results on synthetic and real data sets confirm the good performance of our algorithm. I.
A Spectral Based Clustering Algorithm for Categorical Data with Maximum Modularity
"... Abstract. In this paper we propose a spectral based clustering algorithm to maximize an extended Modularity measure for categorical data; first, we establish the connection with the Relational Analysis criterion. Second, the maximization of the extended modularity is shown as a trace maximization pr ..."
Abstract
- Add to MetaCart
Abstract. In this paper we propose a spectral based clustering algorithm to maximize an extended Modularity measure for categorical data; first, we establish the connection with the Relational Analysis criterion. Second, the maximization of the extended modularity is shown as a trace maximization problem. A spectral based algorithm is then presented to search for the partitions maximizing the extended Modularity criterion. Experimental results indicate that the new algorithm is efficient and effective at finding a good clustering across a variety of real-world data sets 1

