MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Survey Of Clustering Data Mining Techniques (2002) [103 citations — 0 self]

Abstract:

Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters neccessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historial perspective rooted in mathematics, statistics and numerical analysis. From a machine learning perspective clusters correspond to hidden patterns, the search for clusters in unsupervised learning and the resulting system represents a data concept. From a practicual perspective clustering plays an outstanding role in data mining applications, Web analysis, CRM, marketing, medical diagnostics, computational biology, and many others. Clustering is the subject of active research in several fields such as statistics, pattern recognition and machine learning. This survery focuses on clustering in data ming. Data mining adds to clustering the complications of very large datasets with very many attributes of different types. This imposes unique computational requirements on relevant clustering algorithms. A variety of algorithms have recently emerged that meet these requirements and were successfully applied to real-life data mining problems. They are subject of the survey.

Citations

5180 Genetic Algorithms – Goldberg - 1989
4923 Elements of Information Theory – Cover, Thomas - 1991
4735 Maximum Likelihood from incomplete data via the EM algorithm – Dempster, Laird, et al. - 1977
3011 Pattern Classification and Scene Analysis – Duda, Hart - 1973
2274 Self-Organizing Maps – Kohonen - 1995
1771 Introduction to Statistical Pattern Recognition – Fukunaga - 1990
1709 R-trees: a dynamic index structure for spatialsearching – Guttman - 1984
1478 Algorithms for Clustering Data – Jain, Dubes - 1988
1309 Randomized algorithms – Motwani, Raghavan - 1995
1125 Vector Quantization and Signal Compression – Gersho, Gray - 1992
971 Estimating the dimension of a model – Schwarz - 1978
970 Principal Component Analysis – Jolliffe - 1986
874 Data Mining: Concepts And Techniques – Han, Kamber - 2001
793 Clustering Algorithms – Hartigan - 1975
767 An efficient heuristic procedure for partitioning graphs – Kernighan, Lin - 1970
740 Modeling by shortest data description – Rissanen - 1978
728 Finding Groups in Data: An Introduction to Cluster Analysis – Kaufman, Rousseeuw - 1990
706 Pattern Recognition with Fuzzy Objective Function Algorithms – Bezdek - 1981
613 Data clustering: a review – Jain, Murty, et al. - 1999
572 The EM Algorithm and Extensions – McLachlan, Krishnan - 1997
570 A Density-Based Algorithm for Discovering Clusters – Ester, Kriegel, et al. - 1996
523 Knowledge Acquisition via Incremental Concept Formation – Fisher - 1987
506 Bayes factors – Kaas, Raftery - 1995
502 On information and sufficiency – Kullback, Leibler - 1951
498 A fast and high quality multilevel scheme for partitioning irregular graphs – Karypis, Kumar - 1998
464 On spectral clustering: Analysis and an algorithm – Ng, Jordan, et al. - 2001
448 Multivariate analysis – Mardia, Kent, et al. - 1979
442 Efficient and effective clustering methods for spatial data mining – Ng, Han - 1994
442 Exploratory Data Analysis – Tukey - 1977
430 Scatter/gather: a cluster-based approach to browsing large document collections – Cutting, Karger, et al. - 1992
415 An algorithm for finding best matches in logarithmic expected time – Friedman, Bentley, et al. - 1977
410 Cluster Analysis for Applications – Anderberg - 1973
400 Using linear algebra for intelligent information retrieval – Berry, Dumais, et al. - 1995
397 Automatic subspace clustering of high dimensional data for data mining applications – AGRAWAL, GEHRKE, et al. - 1998
385 Stochastic Complexity – Rissanen - 1987
374 Using information content to evaluate semantic similarity in a taxonomy – Resnik - 1995
362 Cure: an efficient clustering algorithm for large databases – Guha, Rastogi, et al. - 2001
361 Bayesian classification (AutoClass): Theory and results – Cheeseman, Stutz - 1995
325 An information-theoretic definition of similarity – Lin - 1998
322 Fast subsequence matching in time-series databases – Faloutsos, Ranganathan, et al.
320 Mixture models: inference and applications to clustering – McLachlan, Basford - 1998
313 FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia databases – FALOUTSOS, LIN - 1995
310 Efficient similarity search in sequence databases – Agrawal, Faloutsos, et al. - 1993
297 Data preparation for mining world wide web browsing – COOLEY, MOBASHER, et al. - 1999
269 BIRCH: an efficient data clustering method for very large databases – Zhang, Ramakrishnan, et al. - 1996
259 Toward optimal feature selection – Koller, Sahami - 1996
239 Multivariate Density Estimation – Scott - 1992
231 W: The information bottleneck method – Tishby, Pereira, et al. - 1999
224 Stochastic Complexity in Statistical Inquiry – Rissanen - 1989
221 Data mining approaches for intrusion detection – Lee, Stolfo - 1998