Results 1  10
of
52
Data Clustering: A Review
 ACM COMPUTING SURVEYS
, 1999
"... Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exp ..."
Abstract

Cited by 1678 (14 self)
 Add to MetaCart
Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult problem combinatorially, and differences in assumptions and contexts in different communities has made the transfer of useful generic concepts and methodologies slow to occur. This paper presents an overview of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques, and identify crosscutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval.
Survey of clustering algorithms
 IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2005
"... Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the ..."
Abstract

Cited by 383 (3 self)
 Add to MetaCart
(Show Context)
Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.
Survey of clustering data mining techniques
, 2002
"... Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in math ..."
Abstract

Cited by 351 (0 self)
 Add to MetaCart
(Show Context)
Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in mathematics, statistics, and numerical analysis. From a machine learning perspective clusters correspond to hidden patterns, the search for clusters is unsupervised learning, and the resulting system represents a data concept. From a practical perspective clustering plays an outstanding role in data mining applications such as scientific data exploration, information retrieval and text mining, spatial database applications, Web analysis, CRM, marketing, medical diagnostics, computational biology, and many others. Clustering is the subject of active research in several fields such as statistics, pattern recognition, and machine learning. This survey focuses on clustering in data mining. Data mining adds to clustering the complications of very large datasets with very many attributes of different types. This imposes unique
An Indexed Bibliography of Genetic Algorithms in Power Engineering
, 1995
"... s: Jan. 1992  Dec. 1994 ffl CTI: Current Technology Index Jan./Feb. 1993  Jan./Feb. 1994 ffl DAI: Dissertation Abstracts International: Vol. 53 No. 1  Vol. 55 No. 4 (1994) ffl EEA: Electrical & Electronics Abstracts: Jan. 1991  Dec. 1994 ffl P: Index to Scientific & Technical Proceed ..."
Abstract

Cited by 85 (10 self)
 Add to MetaCart
s: Jan. 1992  Dec. 1994 ffl CTI: Current Technology Index Jan./Feb. 1993  Jan./Feb. 1994 ffl DAI: Dissertation Abstracts International: Vol. 53 No. 1  Vol. 55 No. 4 (1994) ffl EEA: Electrical & Electronics Abstracts: Jan. 1991  Dec. 1994 ffl P: Index to Scientific & Technical Proceedings: Jan. 1986  Feb. 1995 (except Nov. 1994) ffl EI A: The Engineering Index Annual: 1987  1992 ffl EI M: The Engineering Index Monthly: Jan. 1993  Dec. 1994 The following GA researchers have already kindly supplied their complete autobibliographies and/or proofread references to their papers: Dan Adler, Patrick Argos, Jarmo T. Alander, James E. Baker, Wolfgang Banzhaf, Ralf Bruns, I. L. Bukatova, Thomas Back, Yuval Davidor, Dipankar Dasgupta, Marco Dorigo, Bogdan Filipic, Terence C. Fogarty, David B. Fogel, Toshio Fukuda, Hugo de Garis, Robert C. Glen, David E. Goldberg, Martina GorgesSchleuter, Jeffrey Horn, Aristides T. Hatjimihail, Mark J. Jakiela, Richard S. Judson, Akihiko Konaga...
Feature Selection in Unsupervised Learning via Evolutionary Search
 In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2000
"... Feature subset selection is an important problem in knowl edge discovery, not only for the insight gained from deter mining relevant modeling variables but also for the improved understandability, scalability, and possibly, accuracy of the resulting models. In this paper we consider the problem of ..."
Abstract

Cited by 71 (3 self)
 Add to MetaCart
(Show Context)
Feature subset selection is an important problem in knowl edge discovery, not only for the insight gained from deter mining relevant modeling variables but also for the improved understandability, scalability, and possibly, accuracy of the resulting models. In this paper we consider the problem of feature selection for unsupervised learning. A number of heuristic criteria can be used to estimate the quality of clusters built from a given featuresubset. Rather than combining such criteria, we use ELSA, an evolutionary lo cal selection algorithm that maintains a diverse population of solutions that approximate the Pareto front in a multi dimensional objectiv espace. Each evolved solution repre sents a feature subset and a number of clusters; a standard Kmeans algorithm is applied to form the given n umber of clusters based on the selected features. Preliminary results on both real and synthetic data show promise in finding Paretooptimal solutions through which we can identify the significant features and the correct number of clusters.
JMeans: A New Local Search Heuristic for Minimum SumofSquares Clustering
"... A new local search heuristic, called JMeans, is proposed for solving the minimum sumofsquares clustering problem. The neighborhood of the current solution is defined by all possible centroidtoentity relocations followed by corresponding changes of assignments. Moves are made in such neighborhoo ..."
Abstract

Cited by 45 (10 self)
 Add to MetaCart
A new local search heuristic, called JMeans, is proposed for solving the minimum sumofsquares clustering problem. The neighborhood of the current solution is defined by all possible centroidtoentity relocations followed by corresponding changes of assignments. Moves are made in such neighborhoods until a local optimum is reached. The new heuristic is compared with two other wellknown local search heuristics, KMeans and HMeans as well as with HMeans+, an improved version of the latter in which degeneracy is removed. Moreover, another heuristic, which fits into the Variable Neighborhood Search metaheuristic framework and uses JMeans in its local search step, is proposed too. Results on standard test problems from the literature are reported. It appears that JMeans outperforms the other local search methods, quite substantially when many entities and clusters are considered. 1 Introduction Consider a set X = fx 1 ; : : : ; xN g, x j = (x 1j ; : : : ; x qj ) 2 R q of N entiti...
GenIc: A Single Pass Generalized Incremental Algorithm for Clustering
 In SIAM Int. Conf. on Data Mining
, 2004
"... In this paper we introduce a new single pass clustering algorithm called GenIc designed with the objective of having low overall cost. We examine some of the properties of GenIc and compare it to windowed kmeans. We also study its performance using experimental data sets obtained from network monit ..."
Abstract

Cited by 27 (2 self)
 Add to MetaCart
(Show Context)
In this paper we introduce a new single pass clustering algorithm called GenIc designed with the objective of having low overall cost. We examine some of the properties of GenIc and compare it to windowed kmeans. We also study its performance using experimental data sets obtained from network monitoring.
Evolutionary Model Selection in Unsupervised Learning
, 2002
"... Feature subset selection is important not only for the insight gained from determining relevant modeling variables but also for the improved understandability, scalability, and possibly, accuracy of the resulting models. Feature selection has traditionally been studied in supervised learning situati ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
Feature subset selection is important not only for the insight gained from determining relevant modeling variables but also for the improved understandability, scalability, and possibly, accuracy of the resulting models. Feature selection has traditionally been studied in supervised learning situations, with some estimate of accuracy used to evaluate candidate subsets. However, we often cannot apply supervised learning for lack of a training signal. For these cases, we propose a new feature selection approach based on clustering. A number of heuristic criteria can be used to estimate the quality of clusters built from a given feature subset. Rather than combining such criteria, we use ELSA, an evolutionary local selection algorithm that maintains a diverse population of solutions that approximate the Pareto front in a multidimensional objective space. Each evolved solution represents a feature subset and a number of clusters; two representative clustering algorithms, Kmeans and EM, are applied to form the given number of clusters based on the selected features. Experimental results on both real and synthetic data show that the method can consistently find approximate Paretooptimal solutions through which we can identify the significant features and an appropriate number of clusters. This results in models with better and clearer semantic relevance. 1.