Results 1  10
of
20
Approximation Algorithms for Projective Clustering
 Proceedings of the ACM SIGMOD International Conference on Management of data, Philadelphia
, 2000
"... We consider the following two instances of the projective clustering problem: Given a set S of n points in R d and an integer k ? 0; cover S by k hyperstrips (resp. hypercylinders) so that the maximum width of a hyperstrip (resp., the maximum diameter of a hypercylinder) is minimized. Let w ..."
Abstract

Cited by 290 (21 self)
 Add to MetaCart
We consider the following two instances of the projective clustering problem: Given a set S of n points in R d and an integer k ? 0; cover S by k hyperstrips (resp. hypercylinders) so that the maximum width of a hyperstrip (resp., the maximum diameter of a hypercylinder) is minimized. Let w be the smallest value so that S can be covered by k hyperstrips (resp. hypercylinders), each of width (resp. diameter) at most w : In the plane, the two problems are equivalent. It is NPHard to compute k planar strips of width even at most Cw ; for any constant C ? 0 [50]. This paper contains four main results related to projective clustering: (i) For d = 2, we present a randomized algorithm that computes O(k log k) strips of width at most 6w that cover S. Its expected running time is O(nk 2 log 4 n) if k 2 log k n; it also works for larger values of k, but then the expected running time is O(n 2=3 k 8=3 log 4 n). We also propose another algorithm that computes a c...
Extensions to the kMeans Algorithm for Clustering Large Data Sets with Categorical Values
, 1998
"... The kmeans algorithm is well known for its efficiency in clustering large data sets. However, working only on numeric values prohibits it from being used to cluster real world data containing categorical values. In this paper we present two algorithms which extend the kmeans algorithm to categoric ..."
Abstract

Cited by 223 (3 self)
 Add to MetaCart
The kmeans algorithm is well known for its efficiency in clustering large data sets. However, working only on numeric values prohibits it from being used to cluster real world data containing categorical values. In this paper we present two algorithms which extend the kmeans algorithm to categorical domains and domains with mixed numeric and categorical values. The kmodes algorithm uses a simple matching dissimilarity measure to deal with categorical objects, replaces the means of clusters with modes, and uses a frequencybased method to update modes in the clustering process to minimise the clustering cost function. With these extensions the kmodes algorithm enables the clustering of categorical data in a fashion similar to kmeans. The kprototypes algorithm, through the definition of a combined dissimilarity measure, further integrates the kmeans and kmodes algorithms to allow for clustering objects described by mixed numeric and categorical attributes. We use the well known soybean disease and credit approval data sets to demonstrate the clustering performance of the two algorithms. Our experiments on two real world data sets with half a million objects each show that the two algorithms are efficient when clustering large data sets, which is critical to data mining applications.
A fuzzy kmodes algorithm for clustering categorical data’, Fuzzy Systems
 IEEE Transactions on
, 1999
"... ©1999 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other wo ..."
Abstract

Cited by 52 (5 self)
 Add to MetaCart
(Show Context)
©1999 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
of Programmed Exception Handling
 Miranda Computer Society
, 1977
"... and other molecular profiles ..."
(Show Context)
2004) Dynamics of projective adaptive resonance theory model: the foundation of PART algorithm
 IEEE Transactions on Neural Networks
"... network developed by Cao and Wu recently has been shown to be very effective in clustering data sets in high dimensional spaces. The PART algorithm is based on the assumptions that the model equations of PART (a large scale and singularly perturbed system of differential equations coupled with a res ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
network developed by Cao and Wu recently has been shown to be very effective in clustering data sets in high dimensional spaces. The PART algorithm is based on the assumptions that the model equations of PART (a large scale and singularly perturbed system of differential equations coupled with a reset mechanism) have quite regular computational performance. This paper provides a rigorous proof of these regular dynamics of the PART model when the signal functions are special step functions, and provides additional simulation results to illustrate the computational performance of PART. Index Terms — Neural networks, data clustering, learning and adaptive systems, pattern recognition, differential equations. I.
A Clustering Algorithm for Predicting CardioVascular Risk
"... Abstract—Cluster analysis is one area of machine learning of particular interest to data mining. It provides for the organization of a collection of patterns, represented as a vector in a multidimensional space, into clusters based on the similarity of these patterns. Medical decision support is als ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Abstract—Cluster analysis is one area of machine learning of particular interest to data mining. It provides for the organization of a collection of patterns, represented as a vector in a multidimensional space, into clusters based on the similarity of these patterns. Medical decision support is also of increasing research interest. Ongoing collaborations between cardiovascular clinicians and computer science are looking at the application of neural networks, and in particular clustering, to the area of individual patient diagnosis, based on clinical records. The cardiovascular domain is characterized as a mixture of continuous and discrete data. This limits the use of the Kmeans algorithm, which is widely used for partitioning clusters in data mining. This paper presents an improvement on the Kmeans algorithm (KMIX) and allows its application to the mixture of attribute types found in the cardiovascular domain. Index Terms — Clustering; KMIX; Kmeans; dissimilarity; patient diagnosis.
Estimation and classification of fMRI hemodynamic response patterns
 NeuroImage
, 2004
"... www.elsevier.com/locate/ynimg ..."
(Show Context)
A MULTIPERSPECTIVE EVALUATION OF MA AND GA FOR COLLABORATIVE FILTERING RECOMMENDER SYSTEM
"... The rising popularity of evolutionary algorithms to solve complex problems has inspired researchers to explore their utility in recommender systems. Recommender systems are intelligent web applications which generate recommendations keeping in view the user’s stated and unstated requirements. Evolut ..."
Abstract
 Add to MetaCart
(Show Context)
The rising popularity of evolutionary algorithms to solve complex problems has inspired researchers to explore their utility in recommender systems. Recommender systems are intelligent web applications which generate recommendations keeping in view the user’s stated and unstated requirements. Evolutionary approaches like Genetic and memetic algorithms have been considered as one of the most successful approaches for combinatorial optimization. Memetic Algorithms (MAs) are enhanced genetic algorithms which incorporate local search in the evolutionary scheme. Local Search process on each solution after every generation helps in improving the convergence time of MA. This paper presents multiperspective comparative evaluation of memetic and genetic evolutionary algorithms for model based collaborative filtering recommender system. Experimental study was conducted on MovieLens dataset to investigate the decision support and statistical efficiency of Memetic and genetic algorithms. Algorithms were analyzed from different perspectives like variation in number of clusters, effect of increasing the number of users, varying number of recommendations and using either one or more than one cluster for computing ratings of the unrated items. Results obtained demonstrated that from all perspectives memetic collaborative filtering algorithm has better predictive accuracy as compared genetic collaborative filtering algorithm.
New Delhi,Delhi62,India
"... Clustering is an unsupervised technique of Data Mining. It means grouping similar objects together and separating the dissimilar ones. Each object in the data set is assigned a class label in the clustering process using a distance measure. This paper has captured the problems that are faced in real ..."
Abstract
 Add to MetaCart
(Show Context)
Clustering is an unsupervised technique of Data Mining. It means grouping similar objects together and separating the dissimilar ones. Each object in the data set is assigned a class label in the clustering process using a distance measure. This paper has captured the problems that are faced in real when clustering algorithms are implemented.It also considers the most extensively used tools which are readily available and support functions which ease the programming. Once algorithms have been implemented, they also need to be tested for its validity. There exist several validation indexes for testing the performance and accuracy which have also been discussed here.