Results 1  10
of
61
An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1993
"... AbstractA novel graph theoretic approach for data clustering is presented and its application to the image segmentation problem is demonstrated. The data to be clustered are represented by an undirected adjacency graph G with arc capacities assigned to reflect the similarity between the linked vert ..."
Abstract

Cited by 286 (0 self)
 Add to MetaCart
AbstractA novel graph theoretic approach for data clustering is presented and its application to the image segmentation problem is demonstrated. The data to be clustered are represented by an undirected adjacency graph G with arc capacities assigned to reflect the similarity between the linked vertices. Clustering is achieved by removing arcs of G to form mutually exclusive subgraphs such that the largest intersubgraph maximum flow is minimized. For graphs of moderate size ( 2000 vertices), the optimal solution is obtained through partitioning a flow and cut equivalent tree of 6, which can be efficiently constructed using the GomoryHu algorithm. However for larger graphs this approach is impractical. New theorems for subgraph condensation are derived and are then used to develop a fast algorithm which hierarchically constructs and partitions a partially equivalent tree of much reduced size. This algorithm results in an optimal solution equivalent to that obtained by partitioning the complete equivalent tree and is able to handle very large graphs with several hundred thousand vertices. The new clustering algorithm is applied to the image segmentation problem. The segmentation is achieved by effectively searching for closed contours of edge elements (equivalent to minimum cuts in G), which consist mostly of strong edges, while rejecting contours containing isolated strong edges. This method is able to accurately locate region boundaries and at the same time guarantees the formation of closed edge contours. Index TermsClustering, edge contours, flow and cut equivalent tree, graph theory, image segmentation, subgraph condensation. D I.
Robust Analysis of Feature Spaces: Color Image Segmentation
, 1997
"... A general technique for the recovery of significant image features is presented. The technique is basedon the mean shift algorithm, a simple nonparametric procedure for estimating density gradients. Drawbacks of the current methods (including robust clustering) are avoided. Featurespace of any natu ..."
Abstract

Cited by 200 (6 self)
 Add to MetaCart
A general technique for the recovery of significant image features is presented. The technique is basedon the mean shift algorithm, a simple nonparametric procedure for estimating density gradients. Drawbacks of the current methods (including robust clustering) are avoided. Featurespace of any naturecan beprocessed, and as an example, color image segmentation is discussed. The segmentation is completely autonomous, only its class is chosen by the user. Thus, the same program can produce a high quality edge image, or provide, by extracting all the significant colors, a preprocessor for contentbased query systems. A 512 x 512 color image is analyzed in less than 10 seconds on a standard workstation. Gray level images are handled as color images having only the lightness coordinate.
Interactive learning using a "society of models"
 SUBMITTED TO SPECIAL ISSUE OF PATTERN RECOGNITION ON IMAGE DATABASE: CLASSIFICATION AND RETRIEVAL
"... Digital library access is driven by features, but features are often contextdependent and noisy, and their relevance for a query is not always obvious. This paper describes an approach for utilizing many datadependent, userdependent, and taskdependent features in a semiautomated tool. Instead o ..."
Abstract

Cited by 153 (11 self)
 Add to MetaCart
Digital library access is driven by features, but features are often contextdependent and noisy, and their relevance for a query is not always obvious. This paper describes an approach for utilizing many datadependent, userdependent, and taskdependent features in a semiautomated tool. Instead of requiring universal similarity measures or manual selection of relevant features, the approach provides a learning algorithm for selecting and combining groupings of the data, where groupings can be induced by highlyspecialized and contextdependent features. The selection process is guided by arichexamplebased interaction with the user. The inherent combinatorics
NonRedundant Data Clustering
, 2004
"... Data clustering is a popular approach for automatically finding classes, concepts, or groups of patterns. In practice this discovery process should avoid redundancies with existing knowledge about class structures or groupings, and reveal novel, previously unknown aspects of the data. In order to de ..."
Abstract

Cited by 72 (3 self)
 Add to MetaCart
Data clustering is a popular approach for automatically finding classes, concepts, or groups of patterns. In practice this discovery process should avoid redundancies with existing knowledge about class structures or groupings, and reveal novel, previously unknown aspects of the data. In order to deal with this problem, we present an extension of the information bottleneck framework, called coordinated conditional information bottleneck, which takes negative relevance information into account by maximizing a conditional mutual information score subject to constraints. Algorithmically, one can apply an alternating optimization scheme that can be used in conjunction with different types of numeric and nonnumeric attributes. We present experimental results for applications in text mining and computer vision.
Feature Subset Selection and Order Identification for Unsupervised Learning
"... This paper explores the problem of feature subset selection for unsupervised learning within the wrapper framework. In particular, we examine feature subset selection wrapped around expectationmaximization (EM) clustering with order identification (identifying the number of clusters in the data). W ..."
Abstract

Cited by 71 (4 self)
 Add to MetaCart
This paper explores the problem of feature subset selection for unsupervised learning within the wrapper framework. In particular, we examine feature subset selection wrapped around expectationmaximization (EM) clustering with order identification (identifying the number of clusters in the data). We investigate two di erent performance criteria for evaluating candidate feature subsets: scatter separability and maximum likelihood. When the "true" number of clusters k is unknown, our experiments on simulated Gaussian data and real data sets show that incorporating the search for k within the feature selection procedure obtains better "class" accuracy than fixing k to be the number of classes. There are two reasons: 1) the "true" number of Gaussian components is not necessarily equal to the number of classes and 2) clustering with different feature subsets can result in di erent numbers of "true" clusters. Our empirical evaluation shows that feature selection reduces the number of features and improves clustering performance with respect to the chosen performance criteria.
Distribution Free Decomposition of Multivariate Data
 Pattern Analysis and Applications
, 1998
"... We present a practical approach to nonparametric cluster analysis of large data sets. The number of clusters and the cluster centers are automatically derived by mode seeking with the mean shift procedure on a reduced set of points randomly selected from the data. The cluster boundaries are delineat ..."
Abstract

Cited by 66 (16 self)
 Add to MetaCart
We present a practical approach to nonparametric cluster analysis of large data sets. The number of clusters and the cluster centers are automatically derived by mode seeking with the mean shift procedure on a reduced set of points randomly selected from the data. The cluster boundaries are delineated using a knearest neighbor technique. The proposed algorithm is stable and efficient, a 10000 point data set being decomposed in only a few seconds. Complex clustering examples and applications are discussed, and convergence of the gradient ascent mean shift procedure is demonstrated for arbitrary distribution and cardinality of the data. Keywords: Nonparametric cluster analysis, mode seeking, gradient density estimation, mean shift procedure, convergence, range searching. 1 Introduction In image understanding the feature spaces derived from real data most often have a complex structure and a priori information to guide the analysis may not be available. The significant features whose ...
Clustering objects on subsets of attributes
 Journal of the Royal Statistical Society
, 2004
"... Proofs subject to correction. Not to be reproduced without permission. Confidential until read to the Society. Contributions to the discussion must not exceed 400 words. Contributions longer than 400 words will be cut by the editor. ..."
Abstract

Cited by 42 (1 self)
 Add to MetaCart
Proofs subject to correction. Not to be reproduced without permission. Confidential until read to the Society. Contributions to the discussion must not exceed 400 words. Contributions longer than 400 words will be cut by the editor.
Measuring Dialect Distance Phonetically
 Proceedings of the Third Meeting of the ACL Special Interest Group in Computational Phonology
, 1997
"... We describe ongoing work in the experimental evaluation of a range of inethods for measuring the phonetic distance between the dialectal variants of pronunciations. All are variants of Levenshtein distance, both simple (based on atomic characters) and complex (based on feature vectors). The measurem ..."
Abstract

Cited by 30 (5 self)
 Add to MetaCart
(Show Context)
We describe ongoing work in the experimental evaluation of a range of inethods for measuring the phonetic distance between the dialectal variants of pronunciations. All are variants of Levenshtein distance, both simple (based on atomic characters) and complex (based on feature vectors). The measurements using feature vectors varied according to whether cityblock distance, Euclidean distance or (a measure using) Pearson's correlation coefficient was taken as basic. Variants of these using feature weighting by entropy reduction were systematically compared, as was the representation of diphthongs (as one symbol or two). The results were compared to wellestablished scholarship in dialectology, yielding a dalibration of the method. These results indicate that feature representations are more sensitive, that cityblock distance is a good measure of phonetic overlap of feature vectors, that weighting is not useful, and that twophone representations of diphthongs provide a more satisfactory base for this sort of comparison.
Identifying Cohesive Subgroups
 Social Networks
, 1995
"... Cohesive subgroups have always represented an important construct for sociologists who study individuals and organizations. In this article, I apply recent advances in the statistical modelling of social network data to the task of identifying cohesive subgroups from social network data. Further, th ..."
Abstract

Cited by 23 (4 self)
 Add to MetaCart
Cohesive subgroups have always represented an important construct for sociologists who study individuals and organizations. In this article, I apply recent advances in the statistical modelling of social network data to the task of identifying cohesive subgroups from social network data. Further, through simulated data, I describe a process for obtaining the probability that a given sample of data could have been obtained from a network in which actors were no more likely to engage in interaction with subgroup members than with members of other subgroups. I obtain the probability for a specific data set, and then, through further simulations, develop a model which can be applied to future data sets. Also through simulated data, I characterize the extent to which a simple hillclimbing algorithm recovers known subgroup memberships. I apply the algorithm to data indicating the extent of professional discussion among teachers in a high school, and I show the relationship
Concept Learning and Feature Selecting Based on SquareError Clustering
 Machine Learning
, 1999
"... . Based on a reinterpretation of the squareerror criterion for classical clustering, a "separateandconquer" version of KMeans clustering is presented and a contribution weightis determined for eachvariable of every cluster. The weight is used to produce conjunctive concepts that descr ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
. Based on a reinterpretation of the squareerror criterion for classical clustering, a "separateandconquer" version of KMeans clustering is presented and a contribution weightis determined for eachvariable of every cluster. The weight is used to produce conjunctive concepts that describe clusters and to reduce or transform the variable (feature) space. Keywords: Clustering, variable weights, conjunctive concepts, feature selection, feature space transformation 1.