Results 1 - 10
of
11
Data Clustering: A Review
- ACM COMPUTING SURVEYS
, 1999
"... Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exp ..."
Abstract
-
Cited by 912 (9 self)
- Add to MetaCart
Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult problem combinatorially, and differences in assumptions and contexts in different communities has made the transfer of useful generic concepts and methodologies slow to occur. This paper presents an overview of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques, and identify cross-cutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval.
Iterative Optimization and Simplification of Hierarchical Clusterings
- Journal of Artificial Intelligence Research
, 1995
"... Clustering is often used for discovering structure in data. Clustering systems differ in the objective function used to evaluate clustering quality and the control strategy used to search the space of clusterings. Ideally, the search strategy should consistently construct clusterings of high qual ..."
Abstract
-
Cited by 96 (1 self)
- Add to MetaCart
Clustering is often used for discovering structure in data. Clustering systems differ in the objective function used to evaluate clustering quality and the control strategy used to search the space of clusterings. Ideally, the search strategy should consistently construct clusterings of high quality, but be computationally inexpensive as well. In general, we cannot have it both ways, but we can partition the search so that a system inexpensively constructs a `tentative' clustering for initial examination, followed by iterative optimization, which continues to search in background for improved clusterings. Given this motivation, we evaluate an inexpensive strategy for creating initial clusterings, coupled with several control strategies for iterative optimization, each of which repeatedly modifies an initial clustering in search of a better one. One of these methods appears novel as an iterative optimization strategy in clustering contexts. Once a clustering has been construct...
Concept Learning and Feature Selecting Based on Square-Error Clustering
- Machine Learning
, 1999
"... . Based on a reinterpretation of the square-error criterion for classical clustering, a "separate-and-conquer" version of K-Means clustering is presented and a contribution weightis determined for eachvariable of every cluster. The weight is used to produce conjunctive concepts that describe cluste ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
. Based on a reinterpretation of the square-error criterion for classical clustering, a "separate-and-conquer" version of K-Means clustering is presented and a contribution weightis determined for eachvariable of every cluster. The weight is used to produce conjunctive concepts that describe clusters and to reduce or transform the variable (feature) space. Keywords: Clustering, variable weights, conjunctive concepts, feature selection, feature space transformation 1.
Reinterpreting the Category Utility Function
, 2001
"... . The category utility function is a partition quality scoring function applied in some clustering programs of machine learning. We reinterpret this function in terms of the data variance explained by a clustering, or, equivalently, in terms of the square-error classical clustering criterion that ad ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
. The category utility function is a partition quality scoring function applied in some clustering programs of machine learning. We reinterpret this function in terms of the data variance explained by a clustering, or, equivalently, in terms of the square-error classical clustering criterion that administers the K-Means and Ward methods. This analysis suggests extensions of the scoring function to situations with differently standardized and mixed scale data. Keywords: Clustering, data standardization, contingency coefficient, correlation ratio, weighting features, mixed-scale data 2 BORIS MIRKIN 1.
Feature Selection and Incremental Learning of Probabilistic Concept Hierarchies
- In Proceedings of the Seventeenth International Conference on Machine Learning
, 2000
"... Research in feature selection has paid little attention to unsupervised learning. In this paper we follow the guidelines suggested in previous work by Gennari and present some empirical results in incremental learning of probabilistic concept hierarchies. We identify dierent types of feature s ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Research in feature selection has paid little attention to unsupervised learning. In this paper we follow the guidelines suggested in previous work by Gennari and present some empirical results in incremental learning of probabilistic concept hierarchies. We identify dierent types of feature selection and justify the use of methods that run in parallel with learning and individually select a dierent set of features for each node in the hierarchy. We use a very simple and inexpensive approach that is exible and powerful enough to explore our proposals. Results indicate that feature selection has a great potential for improving eciency while maintaining or even improving performance. 1.
Knowledge Discovery in an Object-Oriented Oceanographic Database System
, 1997
"... The rate at which scientific data is collected today has overwhelmed the ability of scientists to store and analyze the data. Current research in knowledge discovery in databases is addressing this problem by developing techniques that can consider large quantities of data and automatically identify ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
The rate at which scientific data is collected today has overwhelmed the ability of scientists to store and analyze the data. Current research in knowledge discovery in databases is addressing this problem by developing techniques that can consider large quantities of data and automatically identify information that is of interest in a particular problem domain. This report describes the results of the first year's efforts in the development of a knowledge discovery system for use by oceanographers at the Naval Oceanographic Office at the Stennis Space Center in the identification of certain oceanographic features. The system consists of two major components: an object-oriented oceanographic database that can support the retrieval of data along various parameters of interest (such as a certain geographic area or a certain date) and a discovery system that can identify the features of interest. During the first year of this project, we (in consultation with the scientists at the Stennis...
Dynamic Feature Selection in Incremental Hierarchical Clustering
- In Machine Learning: ECML 2000, 11th European Conference on Machine Learning
, 2000
"... . Feature selection has received a lot of attention in the machine learning community, but mainly under the supervised paradigm. In this work we study the potential benets of feature selection in hierarchical clustering tasks. Particularly we address this problem in the context of incremental cl ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
. Feature selection has received a lot of attention in the machine learning community, but mainly under the supervised paradigm. In this work we study the potential benets of feature selection in hierarchical clustering tasks. Particularly we address this problem in the context of incremental clustering, following the basic ideas of Gennari [8]. By using a simple implementation, we show that a feature selection scheme running in parallel with the learning process can improve the clustering task under the dimensions of accuracy, eciency in learning, eciency in prediction and comprehensibility. 1
Computer Society.
"... Clustering is the unsupervised classi cation of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines � this re ects its broad appeal and usefulness as one of the steps in expl ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Clustering is the unsupervised classi cation of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines � this re ects its broad appeal and usefulness as one of the steps in exploratory data analysis. However, clustering is a di cult problem combinatorially and di erences in assumptions and contexts in di erent communities has made the transfer of useful generic concepts and methodologies slow to occur. This paper presents an overview of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques and identify cross-cutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval.
AND
"... Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exp ..."
Abstract
- Add to MetaCart
Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult problem combinatorially, and differences in assumptions and contexts in different communities has made the transfer of useful generic concepts and methodologies slow to occur. This paper presents an overview of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques, and identify cross-cutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval.

