Results 1 - 10
of
35
An Empirical Comparison of Four Initialization Methods for the K-Means Algorithm
, 1999
"... In this paper, we aim to compare empirically four initialization methods for the K-Means algorithm: random, Forgy, MacQueen and Kaufman. Although this algorithm is known for its robustness, it is widely reported in literature that its performance depends upon two key points: initial clustering an ..."
Abstract
-
Cited by 62 (0 self)
- Add to MetaCart
In this paper, we aim to compare empirically four initialization methods for the K-Means algorithm: random, Forgy, MacQueen and Kaufman. Although this algorithm is known for its robustness, it is widely reported in literature that its performance depends upon two key points: initial clustering and instance order. We conduct a series of experiments to draw up (in terms of mean, maximum, minimum and standard deviation) the probability distribution of the square-error values of the final clusters returned by the K-Means algorithm independently on any initial clustering and on any instance order when each of the four initialization methods is used. The results of our experiments illustrate that the random and the Kaufman initialization methods outperform the rest of the compared methods as they make the K-Means more effective and more independent on initial clustering and on instance order. In addition, we compare the convergence speed of the K-Means algorithm when using each o...
The effectiveness of lloyd-type methods for the k-means problem
- In 47th IEEE Symposium on the Foundations of Computer Science (FOCS
, 2006
"... We investigate variants of Lloyd’s heuristic for clustering high dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and in order to suggest improvements in its application. We propose and justify a clusterability criterion for data s ..."
Abstract
-
Cited by 32 (3 self)
- Add to MetaCart
We investigate variants of Lloyd’s heuristic for clustering high dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and in order to suggest improvements in its application. We propose and justify a clusterability criterion for data sets. We present variants of Lloyd’s heuristic that quickly lead to provably near-optimal clustering solutions when applied to well-clusterable instances. This is the first performance guarantee for a variant of Lloyd’s heuristic. The provision of a guarantee on output quality does not come at the expense of speed: some of our algorithms are candidates for being faster in practice than currently used variants of Lloyd’s method. In addition, our other algorithms are faster on well-clusterable instances than recently proposed approximation algorithms, while maintaining similar guarantees on clustering quality. Our main algorithmic contribution is a novel probabilistic seeding process for the starting configuration of a Lloyd-type iteration. 1.
Detecting stable clusters using principal component analysis
- In Functional Genomics: Methods and Protocols. M.J. Brownstein and A. Kohodursky (eds.) Humana press, 2003
"... Clustering is one of the most commonly used tools in the analysis of gene expression data (1, 2). The usage in grouping genes is based on the premise that co-expression is a result of co-regulation. It is thus a preliminary step in extracting gene networks and inference of gene function (3, 4). Clus ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Clustering is one of the most commonly used tools in the analysis of gene expression data (1, 2). The usage in grouping genes is based on the premise that co-expression is a result of co-regulation. It is thus a preliminary step in extracting gene networks and inference of gene function (3, 4). Clustering of experiments can be used to discover novel
Comparison of Unsupervised Classifiers
, 1996
"... : The activity of sorting like objects into classes without any help from an omniscient supervisor is known as unsupervised classification. In AI both symbolic and connectionist camps study classification. The statistical classifiers such as Autoclass and Snob search for the theory that can best exp ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
: The activity of sorting like objects into classes without any help from an omniscient supervisor is known as unsupervised classification. In AI both symbolic and connectionist camps study classification. The statistical classifiers such as Autoclass and Snob search for the theory that can best explain the distribution of given data, whereas neural network classifiers such as Kohonen's networks and ART2 use the vector quantization principle for classifying data. Previously, many studies have compared supervised classification algorithms, but the more challenging problem of comparing unsupervised classifiers has largely been ignored. We performed an empirical comparison of ART2, Autoclass and Snob. We highlight the strengths and weaknesses of the various classifiers. Overall, statistical classifiers, especially Snob, perform better than their neural network counterpart ART2. Keywords: Unsupervised classification. Area of Interest: Concept formation and classification. 1 Introduction ...
Three-mode partitioning
- Comput. Stat. Data Anal
, 2006
"... The three-mode partitioning model is a clustering model for three-way three-mode data sets that implies a simultaneous partitioning of all three modes involved in the data. In the associated data analysis, a data array is approximated by a model array that can be represented by a three-mode partitio ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
The three-mode partitioning model is a clustering model for three-way three-mode data sets that implies a simultaneous partitioning of all three modes involved in the data. In the associated data analysis, a data array is approximated by a model array that can be represented by a three-mode partitioning model of a prespecified rank, minimizing a least squares loss function in terms of differences between data and model. Algorithms have been proposed for this minimization, but their performance is not yet clear. A framework for alternating least-squares methods is described in order to offset the performance problem. Furthermore, a number of both existing and novel algorithms are discussed within this framework. An extensive simulation study is reported in which these algorithms are evaluated and compared according to sensitivity to local optima. The recovery of the truth underlying the data is investigated in order to assess the optimal estimates. The ordering of the algorithms with respect to performance in finding the optimal solution appears to change as compared to the results obtained from the simulation study when a collection of four empirical data sets have been used. This finding is attributed to violations of the implicit stochastic model underlying both the least-squares loss function and the simulation study. Support for the latter attribution is found in a second simulation study.
Knowledge Discovery From Distributed And Textual Data
- Hong Kong University of Science and Technology
, 1999
"... xvi 1) ..."
Optimal Matching and the Social Sciences
- In IATUR - XXVIII Annual Conference
, 2006
"... Les documents de travail ne reflètent pas la position de l'INSEE et n'engagent que leurs auteurs. Working papers do not reflect the position of INSEE but only the views of the authors. 1 Observatoire sociologique du changement (Science-po & CNRS) and Laboratoire de Sociologie Quantitative ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Les documents de travail ne reflètent pas la position de l'INSEE et n'engagent que leurs auteurs. Working papers do not reflect the position of INSEE but only the views of the authors. 1 Observatoire sociologique du changement (Science-po & CNRS) and Laboratoire de Sociologie Quantitative
Identifying subtypes of criminal psychopaths: A replication and extension
- Criminal Justice and Behavior
, 2007
"... The online version of this article can be found at: ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The online version of this article can be found at:

