An Empirical Comparison of Four Initialization Methods for the K-Means Algorithm (1999)
| Citations: | 62 - 0 self |
BibTeX
@MISC{Pena99anempirical,
author = {J.M. Pena and J.A. Lozano and P. Larranaga},
title = {An Empirical Comparison of Four Initialization Methods for the K-Means Algorithm},
year = {1999}
}
Years of Citing Articles
OpenURL
Abstract
In this paper, we aim to compare empirically four initialization methods for the K-Means algorithm: random, Forgy, MacQueen and Kaufman. Although this algorithm is known for its robustness, it is widely reported in literature that its performance depends upon two key points: initial clustering and instance order. We conduct a series of experiments to draw up (in terms of mean, maximum, minimum and standard deviation) the probability distribution of the square-error values of the final clusters returned by the K-Means algorithm independently on any initial clustering and on any instance order when each of the four initialization methods is used. The results of our experiments illustrate that the random and the Kaufman initialization methods outperform the rest of the compared methods as they make the K-Means more effective and more independent on initial clustering and on instance order. In addition, we compare the convergence speed of the K-Means algorithm when using each o...







