Results 1 -
7 of
7
Approximation Algorithms for Projective Clustering
- Proceedings of the ACM SIGMOD International Conference on Management of data, Philadelphia
, 2000
"... We consider the following two instances of the projective clustering problem: Given a set S of n points in R d and an integer k ? 0; cover S by k hyper-strips (resp. hyper-cylinders) so that the maximum width of a hyper-strip (resp., the maximum diameter of a hyper-cylinder) is minimized. Let w ..."
Abstract
-
Cited by 196 (14 self)
- Add to MetaCart
We consider the following two instances of the projective clustering problem: Given a set S of n points in R d and an integer k ? 0; cover S by k hyper-strips (resp. hyper-cylinders) so that the maximum width of a hyper-strip (resp., the maximum diameter of a hyper-cylinder) is minimized. Let w be the smallest value so that S can be covered by k hyper-strips (resp. hyper-cylinders), each of width (resp. diameter) at most w : In the plane, the two problems are equivalent. It is NP-Hard to compute k planar strips of width even at most Cw ; for any constant C ? 0 [50]. This paper contains four main results related to projective clustering: (i) For d = 2, we present a randomized algorithm that computes O(k log k) strips of width at most 6w that cover S. Its expected running time is O(nk 2 log 4 n) if k 2 log k n; it also works for larger values of k, but then the expected running time is O(n 2=3 k 8=3 log 4 n). We also propose another algorithm that computes a c...
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
, 1998
"... The k-means algorithm is well known for its efficiency in clustering large data sets. However, working only on numeric values prohibits it from being used to cluster real world data containing categorical values. In this paper we present two algorithms which extend the k-means algorithm to categoric ..."
Abstract
-
Cited by 109 (2 self)
- Add to MetaCart
The k-means algorithm is well known for its efficiency in clustering large data sets. However, working only on numeric values prohibits it from being used to cluster real world data containing categorical values. In this paper we present two algorithms which extend the k-means algorithm to categorical domains and domains with mixed numeric and categorical values. The k-modes algorithm uses a simple matching dissimilarity measure to deal with categorical objects, replaces the means of clusters with modes, and uses a frequency-based method to update modes in the clustering process to minimise the clustering cost function. With these extensions the k-modes algorithm enables the clustering of categorical data in a fashion similar to k-means. The k-prototypes algorithm, through the definition of a combined dissimilarity measure, further integrates the k-means and k-modes algorithms to allow for clustering objects described by mixed numeric and categorical attributes. We use the well known soybean disease and credit approval data sets to demonstrate the clustering performance of the two algorithms. Our experiments on two real world data sets with half a million objects each show that the two algorithms are efficient when clustering large data sets, which is critical to data mining applications.
of Programmed Exception Handling
- Miranda Computer Society
, 1977
"... and other molecular profiles ..."
BIOINFORMATICS ORIGINAL PAPER
"... Motivation: Transcription-factor binding sites (TFBS) in promoter sequences of higher eukaryotes are commonly modeled using position frequency matrices (PFM). The ability to compare PFMs representing binding sites is especially important for de novo sequence motif discovery, where it is desirable to ..."
Abstract
- Add to MetaCart
Motivation: Transcription-factor binding sites (TFBS) in promoter sequences of higher eukaryotes are commonly modeled using position frequency matrices (PFM). The ability to compare PFMs representing binding sites is especially important for de novo sequence motif discovery, where it is desirable to compare putative matrices to one another and to known matrices. Results: We describe a PFM similarity quantification method based on product multinomial distributions, demonstrate its ability to identify PFM similarity and show that it has a better false positive to false negative ratio compared to existing methods. We grouped TFBS frequency matrices from two libraries into matrix families and identified the matrices that are common and unique to these libraries. We identified similarities and differences between the skeletal-muscle-specific and nonmuscle-specific frequency matrices for the binding sites of Mef-2, Myf, Sp-1, SRF and TEF of Wasserman and Fickett. We further identified known frequency matrices and matrix families that were strongly similar to the matrices given by Wasserman and Fickett. We provide methodology and tools to compare and query libraries of frequency matrices for TFBSs. Availability: Software is available to use over the Web at
A Clustering Algorithm for Predicting CardioVascular Risk
"... Abstract—Cluster analysis is one area of machine learning of particular interest to data mining. It provides for the organization of a collection of patterns, represented as a vector in a multidimensional space, into clusters based on the similarity of these patterns. Medical decision support is als ..."
Abstract
- Add to MetaCart
Abstract—Cluster analysis is one area of machine learning of particular interest to data mining. It provides for the organization of a collection of patterns, represented as a vector in a multidimensional space, into clusters based on the similarity of these patterns. Medical decision support is also of increasing research interest. Ongoing collaborations between cardiovascular clinicians and computer science are looking at the application of neural networks, and in particular clustering, to the area of individual patient diagnosis, based on clinical records. The cardiovascular domain is characterized as a mixture of continuous and discrete data. This limits the use of the K-means algorithm, which is widely used for partitioning clusters in data mining. This paper presents an improvement on the K-means algorithm (KMIX) and allows its application to the mixture of attribute types found in the cardiovascular domain. Index Terms — Clustering; KMIX; K-means; dissimilarity; patient diagnosis.
A MULTI-PERSPECTIVE EVALUATION OF MA AND GA FOR COLLABORATIVE FILTERING RECOMMENDER SYSTEM
"... The rising popularity of evolutionary algorithms to solve complex problems has inspired researchers to explore their utility in recommender systems. Recommender systems are intelligent web applications which generate recommendations keeping in view the user’s stated and unstated requirements. Evolut ..."
Abstract
- Add to MetaCart
The rising popularity of evolutionary algorithms to solve complex problems has inspired researchers to explore their utility in recommender systems. Recommender systems are intelligent web applications which generate recommendations keeping in view the user’s stated and unstated requirements. Evolutionary approaches like Genetic and memetic algorithms have been considered as one of the most successful approaches for combinatorial optimization. Memetic Algorithms (MAs) are enhanced genetic algorithms which incorporate local search in the evolutionary scheme. Local Search process on each solution after every generation helps in improving the convergence time of MA. This paper presents multi-perspective comparative evaluation of memetic and genetic evolutionary algorithms for model based collaborative filtering recommender system. Experimental study was conducted on MovieLens dataset to investigate the decision support and statistical efficiency of Memetic and genetic algorithms. Algorithms were analyzed from different perspectives like variation in number of clusters, effect of increasing the number of users, varying number of recommendations and using either one or more than one cluster for computing ratings of the unrated items. Results obtained demonstrated that from all perspectives memetic collaborative filtering algorithm has better predictive accuracy as compared genetic collaborative filtering algorithm.
An Efficient K-Means and C-Means Clustering Algorithm for Image Segmentation 1
, 2012
"... In this paper, we present a novel algorithm for performing k-means clustering. It organizes all the patterns in a k-d tree structure such that one can find all the patterns which are closest to a given prototype efficiently. The main intuition behind our approach is as follows. All the prototypes ar ..."
Abstract
- Add to MetaCart
In this paper, we present a novel algorithm for performing k-means clustering. It organizes all the patterns in a k-d tree structure such that one can find all the patterns which are closest to a given prototype efficiently. The main intuition behind our approach is as follows. All the prototypes are potential candidates for the closest prototype at the root level. However, for the children of the root node, we may be able to prune the candidate set by using simple geometrical constraints. This approach can be applied recursively until the size of the candidate set is one for each node. Our experimental results demonstrate that our scheme can improve the computational speed of the direct k-means algorithm by an order to two orders of magnitude in the total number of distance calculations and the overall time of computation.

