Results 1  10
of
13
Approximation Algorithms for Projective Clustering
 Proceedings of the ACM SIGMOD International Conference on Management of data, Philadelphia
, 2000
"... We consider the following two instances of the projective clustering problem: Given a set S of n points in R d and an integer k ? 0; cover S by k hyperstrips (resp. hypercylinders) so that the maximum width of a hyperstrip (resp., the maximum diameter of a hypercylinder) is minimized. Let w ..."
Abstract

Cited by 246 (21 self)
 Add to MetaCart
We consider the following two instances of the projective clustering problem: Given a set S of n points in R d and an integer k ? 0; cover S by k hyperstrips (resp. hypercylinders) so that the maximum width of a hyperstrip (resp., the maximum diameter of a hypercylinder) is minimized. Let w be the smallest value so that S can be covered by k hyperstrips (resp. hypercylinders), each of width (resp. diameter) at most w : In the plane, the two problems are equivalent. It is NPHard to compute k planar strips of width even at most Cw ; for any constant C ? 0 [50]. This paper contains four main results related to projective clustering: (i) For d = 2, we present a randomized algorithm that computes O(k log k) strips of width at most 6w that cover S. Its expected running time is O(nk 2 log 4 n) if k 2 log k n; it also works for larger values of k, but then the expected running time is O(n 2=3 k 8=3 log 4 n). We also propose another algorithm that computes a c...
Extensions to the kMeans Algorithm for Clustering Large Data Sets with Categorical Values
, 1998
"... The kmeans algorithm is well known for its efficiency in clustering large data sets. However, working only on numeric values prohibits it from being used to cluster real world data containing categorical values. In this paper we present two algorithms which extend the kmeans algorithm to categoric ..."
Abstract

Cited by 156 (2 self)
 Add to MetaCart
The kmeans algorithm is well known for its efficiency in clustering large data sets. However, working only on numeric values prohibits it from being used to cluster real world data containing categorical values. In this paper we present two algorithms which extend the kmeans algorithm to categorical domains and domains with mixed numeric and categorical values. The kmodes algorithm uses a simple matching dissimilarity measure to deal with categorical objects, replaces the means of clusters with modes, and uses a frequencybased method to update modes in the clustering process to minimise the clustering cost function. With these extensions the kmodes algorithm enables the clustering of categorical data in a fashion similar to kmeans. The kprototypes algorithm, through the definition of a combined dissimilarity measure, further integrates the kmeans and kmodes algorithms to allow for clustering objects described by mixed numeric and categorical attributes. We use the well known soybean disease and credit approval data sets to demonstrate the clustering performance of the two algorithms. Our experiments on two real world data sets with half a million objects each show that the two algorithms are efficient when clustering large data sets, which is critical to data mining applications.
of Programmed Exception Handling
 Miranda Computer Society
, 1977
"... and other molecular profiles ..."
2004) Dynamics of projective adaptive resonance theory model: the foundation of PART algorithm
 IEEE Transactions on Neural Networks
"... network developed by Cao and Wu recently has been shown to be very effective in clustering data sets in high dimensional spaces. The PART algorithm is based on the assumptions that the model equations of PART (a large scale and singularly perturbed system of differential equations coupled with a res ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
network developed by Cao and Wu recently has been shown to be very effective in clustering data sets in high dimensional spaces. The PART algorithm is based on the assumptions that the model equations of PART (a large scale and singularly perturbed system of differential equations coupled with a reset mechanism) have quite regular computational performance. This paper provides a rigorous proof of these regular dynamics of the PART model when the signal functions are special step functions, and provides additional simulation results to illustrate the computational performance of PART. Index Terms — Neural networks, data clustering, learning and adaptive systems, pattern recognition, differential equations. I.
A Clustering Algorithm for Predicting CardioVascular Risk
"... Abstract—Cluster analysis is one area of machine learning of particular interest to data mining. It provides for the organization of a collection of patterns, represented as a vector in a multidimensional space, into clusters based on the similarity of these patterns. Medical decision support is als ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Abstract—Cluster analysis is one area of machine learning of particular interest to data mining. It provides for the organization of a collection of patterns, represented as a vector in a multidimensional space, into clusters based on the similarity of these patterns. Medical decision support is also of increasing research interest. Ongoing collaborations between cardiovascular clinicians and computer science are looking at the application of neural networks, and in particular clustering, to the area of individual patient diagnosis, based on clinical records. The cardiovascular domain is characterized as a mixture of continuous and discrete data. This limits the use of the Kmeans algorithm, which is widely used for partitioning clusters in data mining. This paper presents an improvement on the Kmeans algorithm (KMIX) and allows its application to the mixture of attribute types found in the cardiovascular domain. Index Terms — Clustering; KMIX; Kmeans; dissimilarity; patient diagnosis.
BIOINFORMATICS ORIGINAL PAPER
"... Motivation: Transcriptionfactor binding sites (TFBS) in promoter sequences of higher eukaryotes are commonly modeled using position frequency matrices (PFM). The ability to compare PFMs representing binding sites is especially important for de novo sequence motif discovery, where it is desirable to ..."
Abstract
 Add to MetaCart
Motivation: Transcriptionfactor binding sites (TFBS) in promoter sequences of higher eukaryotes are commonly modeled using position frequency matrices (PFM). The ability to compare PFMs representing binding sites is especially important for de novo sequence motif discovery, where it is desirable to compare putative matrices to one another and to known matrices. Results: We describe a PFM similarity quantification method based on product multinomial distributions, demonstrate its ability to identify PFM similarity and show that it has a better false positive to false negative ratio compared to existing methods. We grouped TFBS frequency matrices from two libraries into matrix families and identified the matrices that are common and unique to these libraries. We identified similarities and differences between the skeletalmusclespecific and nonmusclespecific frequency matrices for the binding sites of Mef2, Myf, Sp1, SRF and TEF of Wasserman and Fickett. We further identified known frequency matrices and matrix families that were strongly similar to the matrices given by Wasserman and Fickett. We provide methodology and tools to compare and query libraries of frequency matrices for TFBSs. Availability: Software is available to use over the Web at
A MULTIPERSPECTIVE EVALUATION OF MA AND GA FOR COLLABORATIVE FILTERING RECOMMENDER SYSTEM
"... The rising popularity of evolutionary algorithms to solve complex problems has inspired researchers to explore their utility in recommender systems. Recommender systems are intelligent web applications which generate recommendations keeping in view the user’s stated and unstated requirements. Evolut ..."
Abstract
 Add to MetaCart
The rising popularity of evolutionary algorithms to solve complex problems has inspired researchers to explore their utility in recommender systems. Recommender systems are intelligent web applications which generate recommendations keeping in view the user’s stated and unstated requirements. Evolutionary approaches like Genetic and memetic algorithms have been considered as one of the most successful approaches for combinatorial optimization. Memetic Algorithms (MAs) are enhanced genetic algorithms which incorporate local search in the evolutionary scheme. Local Search process on each solution after every generation helps in improving the convergence time of MA. This paper presents multiperspective comparative evaluation of memetic and genetic evolutionary algorithms for model based collaborative filtering recommender system. Experimental study was conducted on MovieLens dataset to investigate the decision support and statistical efficiency of Memetic and genetic algorithms. Algorithms were analyzed from different perspectives like variation in number of clusters, effect of increasing the number of users, varying number of recommendations and using either one or more than one cluster for computing ratings of the unrated items. Results obtained demonstrated that from all perspectives memetic collaborative filtering algorithm has better predictive accuracy as compared genetic collaborative filtering algorithm.
An Efficient KMeans and CMeans Clustering Algorithm for Image Segmentation 1
, 2012
"... In this paper, we present a novel algorithm for performing kmeans clustering. It organizes all the patterns in a kd tree structure such that one can find all the patterns which are closest to a given prototype efficiently. The main intuition behind our approach is as follows. All the prototypes ar ..."
Abstract
 Add to MetaCart
In this paper, we present a novel algorithm for performing kmeans clustering. It organizes all the patterns in a kd tree structure such that one can find all the patterns which are closest to a given prototype efficiently. The main intuition behind our approach is as follows. All the prototypes are potential candidates for the closest prototype at the root level. However, for the children of the root node, we may be able to prune the candidate set by using simple geometrical constraints. This approach can be applied recursively until the size of the candidate set is one for each node. Our experimental results demonstrate that our scheme can improve the computational speed of the direct kmeans algorithm by an order to two orders of magnitude in the total number of distance calculations and the overall time of computation.
Principal Supervisor: Dr. Richi Nayak
"... Handling information overload online, from the user’s point of view is a big challenge, especially when the number of websites is growing rapidly due to growth in ecommerce and other related activities. Personalization based on user needs is the key to solving the problem of information overload. P ..."
Abstract
 Add to MetaCart
Handling information overload online, from the user’s point of view is a big challenge, especially when the number of websites is growing rapidly due to growth in ecommerce and other related activities. Personalization based on user needs is the key to solving the problem of information overload. Personalization methods help in identifying relevant information, which may be liked by a user. User profile and object profile are the important elements of a personalization system. When creating user and object profiles, most of the existing methods adopt twodimensional similarity methods based on vector or matrix models in order to find interuser and interobject similarity. Moreover, for recommending similar objects to users, personalization systems use the usersusers, itemsitems and usersitems similarity measures. In most cases similarity measures such as Euclidian, Manhattan, cosine and many others based on vector or matrix methods are used to find the similarities. Web logs are highdimensional datasets, consisting of multiple users, multiple searches with many attributes to each. Twodimensional data analysis methods may often
BIOINFORMATICS ORIGINAL PAPER
"... Vol. 21 no. 3 2005, pages 307–313 doi:10.1093/bioinformatics/bth480 Similarity of position frequency matrices for transcription factor binding sites ..."
Abstract
 Add to MetaCart
Vol. 21 no. 3 2005, pages 307–313 doi:10.1093/bioinformatics/bth480 Similarity of position frequency matrices for transcription factor binding sites