Results 1 -
5 of
5
An Empirical Comparison of Four Initialization Methods for the K-Means Algorithm
, 1999
"... In this paper, we aim to compare empirically four initialization methods for the K-Means algorithm: random, Forgy, MacQueen and Kaufman. Although this algorithm is known for its robustness, it is widely reported in literature that its performance depends upon two key points: initial clustering an ..."
Abstract
-
Cited by 62 (0 self)
- Add to MetaCart
In this paper, we aim to compare empirically four initialization methods for the K-Means algorithm: random, Forgy, MacQueen and Kaufman. Although this algorithm is known for its robustness, it is widely reported in literature that its performance depends upon two key points: initial clustering and instance order. We conduct a series of experiments to draw up (in terms of mean, maximum, minimum and standard deviation) the probability distribution of the square-error values of the final clusters returned by the K-Means algorithm independently on any initial clustering and on any instance order when each of the four initialization methods is used. The results of our experiments illustrate that the random and the Kaufman initialization methods outperform the rest of the compared methods as they make the K-Means more effective and more independent on initial clustering and on instance order. In addition, we compare the convergence speed of the K-Means algorithm when using each o...
The effectiveness of lloyd-type methods for the k-means problem
- In 47th IEEE Symposium on the Foundations of Computer Science (FOCS
, 2006
"... We investigate variants of Lloyd’s heuristic for clustering high dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and in order to suggest improvements in its application. We propose and justify a clusterability criterion for data s ..."
Abstract
-
Cited by 32 (3 self)
- Add to MetaCart
We investigate variants of Lloyd’s heuristic for clustering high dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and in order to suggest improvements in its application. We propose and justify a clusterability criterion for data sets. We present variants of Lloyd’s heuristic that quickly lead to provably near-optimal clustering solutions when applied to well-clusterable instances. This is the first performance guarantee for a variant of Lloyd’s heuristic. The provision of a guarantee on output quality does not come at the expense of speed: some of our algorithms are candidates for being faster in practice than currently used variants of Lloyd’s method. In addition, our other algorithms are faster on well-clusterable instances than recently proposed approximation algorithms, while maintaining similar guarantees on clustering quality. Our main algorithmic contribution is a novel probabilistic seeding process for the starting configuration of a Lloyd-type iteration. 1.
Self-organizing Maps as Substitutes for K-Means Clustering
- In
, 2005
"... One of the most widely used clustering techniques used in GISc problems is the k-means algorithm. One of the most important issues in the correct use of k-means is the initialization procedure that ultimately determines which part of the solution space will be searched. In this paper we briefly ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
One of the most widely used clustering techniques used in GISc problems is the k-means algorithm. One of the most important issues in the correct use of k-means is the initialization procedure that ultimately determines which part of the solution space will be searched. In this paper we briefly review different initialization procedures, and propose Kohonen's SelfOrganizing Maps as the most convenient method, given the proper training parameters. Furthermore, we show that in the final stages of its training procedure the Self-Organizing Map algorithms is rigorously the same as the kmeans algorithm. Thus we propose the use of Self-Organizing Maps as possible substitutes for the more classical k-means clustering algorithms.
Chemical library subset selection algorithms: a unified derivation using spatial statistics
- JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCE
, 2002
"... ..."
ABSTRACT AUTOMATIC COLOR PALETTE
"... Color palettes are an important tool for color image analysis, since they are the initial point of different techniques such as quantization or indexing. This paper presents a new method for the automatic construction of a color palette, which adjusts dynamically its number of colors according to th ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Color palettes are an important tool for color image analysis, since they are the initial point of different techniques such as quantization or indexing. This paper presents a new method for the automatic construction of a color palette, which adjusts dynamically its number of colors according to the visual content of the image. The method is based on appropriately segmenting the HSI color space, which is achieved by individually partitioning the histograms associated to each color component. As a result we obtain a hierarchical color palette, which represents the color image with a reduced number of colors. 1.

