Results 1  10
of
69
Data Clustering: A Review
 ACM COMPUTING SURVEYS
, 1999
"... Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exp ..."
Abstract

Cited by 1284 (13 self)
 Add to MetaCart
Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult problem combinatorially, and differences in assumptions and contexts in different communities has made the transfer of useful generic concepts and methodologies slow to occur. This paper presents an overview of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques, and identify crosscutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval.
Automatically characterizing large scale program behavior
, 2002
"... Understanding program behavior is at the foundation of computer architecture and program optimization. Many programs have wildly different behavior on even the very largest of scales (over the complete execution of the program). This realization has ramifications for many architectural and compile ..."
Abstract

Cited by 619 (41 self)
 Add to MetaCart
Understanding program behavior is at the foundation of computer architecture and program optimization. Many programs have wildly different behavior on even the very largest of scales (over the complete execution of the program). This realization has ramifications for many architectural and compiler techniques, from thread scheduling, to feedback directed optimizations, to the way programs are simulated. However, in order to take advantage of timevarying behavior, we.must first develop the analytical tools necessary to automatically and efficiently analyze program behavior over large sections of execution. Our goal is to develop automatic techniques that are capable of finding and exploiting the Large Scale Behavior of programs (behavior seen over billions of instructions). The first step towards this goal is the development of a hardware independent metric that can concisely summarize the behavior of an arbitrary section of execution in a program. To this end we examine the use of Basic Block Vectors. We quantify the effectiveness of Basic Block Vectors in capturing program behavior across several different architectural metrics, explore the large scale behavior of several programs, and develop a set of algorithms based on clustering capable of analyzing this behavior. We then demonstrate an application of this technology to automatically determine where to simulate for a program to help guide computer architecture research. 1.
Robust multiresolution estimation of parametric motion models
 Jal of Vis. Comm. and Image Representation
, 1995
"... This paper describes a method to estimate parametric motion models. Motivations for the use of such models are on one hand their efficiency, which has been demonstrated in numerous contexts such as estimation, segmentation, tracking and interpretation of motion, and on the other hand, their low comp ..."
Abstract

Cited by 274 (48 self)
 Add to MetaCart
This paper describes a method to estimate parametric motion models. Motivations for the use of such models are on one hand their efficiency, which has been demonstrated in numerous contexts such as estimation, segmentation, tracking and interpretation of motion, and on the other hand, their low computational cost compared to optical flow estimation. However, it is important to have the best accuracy for the estimated parameters, and to take into account the problem of multiple motion. We have therefore developed two robust estimators in a multiresolution framework. Numerical results support this approach, as validated by the use of these algorithms on complex sequences. 1
Approximation Algorithms for Projective Clustering
 Proceedings of the ACM SIGMOD International Conference on Management of data, Philadelphia
, 2000
"... We consider the following two instances of the projective clustering problem: Given a set S of n points in R d and an integer k ? 0; cover S by k hyperstrips (resp. hypercylinders) so that the maximum width of a hyperstrip (resp., the maximum diameter of a hypercylinder) is minimized. Let w ..."
Abstract

Cited by 246 (21 self)
 Add to MetaCart
We consider the following two instances of the projective clustering problem: Given a set S of n points in R d and an integer k ? 0; cover S by k hyperstrips (resp. hypercylinders) so that the maximum width of a hyperstrip (resp., the maximum diameter of a hypercylinder) is minimized. Let w be the smallest value so that S can be covered by k hyperstrips (resp. hypercylinders), each of width (resp. diameter) at most w : In the plane, the two problems are equivalent. It is NPHard to compute k planar strips of width even at most Cw ; for any constant C ? 0 [50]. This paper contains four main results related to projective clustering: (i) For d = 2, we present a randomized algorithm that computes O(k log k) strips of width at most 6w that cover S. Its expected running time is O(nk 2 log 4 n) if k 2 log k n; it also works for larger values of k, but then the expected running time is O(n 2=3 k 8=3 log 4 n). We also propose another algorithm that computes a c...
Robust Analysis of Feature Spaces: Color Image Segmentation
, 1997
"... A general technique for the recovery of significant image features is presented. The technique is basedon the mean shift algorithm, a simple nonparametric procedure for estimating density gradients. Drawbacks of the current methods (including robust clustering) are avoided. Featurespace of any natu ..."
Abstract

Cited by 186 (6 self)
 Add to MetaCart
A general technique for the recovery of significant image features is presented. The technique is basedon the mean shift algorithm, a simple nonparametric procedure for estimating density gradients. Drawbacks of the current methods (including robust clustering) are avoided. Featurespace of any naturecan beprocessed, and as an example, color image segmentation is discussed. The segmentation is completely autonomous, only its class is chosen by the user. Thus, the same program can produce a high quality edge image, or provide, by extracting all the significant colors, a preprocessor for contentbased query systems. A 512 x 512 color image is analyzed in less than 10 seconds on a standard workstation. Gray level images are handled as color images having only the lightness coordinate.
Comparison of texture features based on gabor filters
 IEEE Trans. on Image Processing
"... Abstractâ€”Texture features that are based on the local power spectrum obtained by a bank of Gabor filters are compared. The features differ in the type of nonlinear postprocessing which is applied to the local power spectrum. The following features are considered: Gabor energy, complex moments, and ..."
Abstract

Cited by 101 (5 self)
 Add to MetaCart
Abstractâ€”Texture features that are based on the local power spectrum obtained by a bank of Gabor filters are compared. The features differ in the type of nonlinear postprocessing which is applied to the local power spectrum. The following features are considered: Gabor energy, complex moments, and grating cell operator features. The capability of the corresponding operators to produce distinct feature vector clusters for different textures is compared using two methods: the Fisher criterion and the classification result comparison. Both methods give consistent results. The grating cell operator gives the best discrimination and segmentation results. The texture detection capabilities of the operators and their robustness to nontexture features are also compared. The grating cell operator is the only one that selectively responds only to texture and does not give false response to nontexture features such as object contours. Index Termsâ€”Classification, complex moments, discrimination,
A Robust Competitive Clustering Algorithm with Applications in Computer Vision
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1998
"... This paper addresses three major issues associated with conventional partitional clustering, namely, sensitivity to initialization, difficulty in determining the number of clusters, and sensitivity to noise and outliers. The proposed Robust Competitive Agglomeration (RCA) algorithm starts with a lar ..."
Abstract

Cited by 81 (3 self)
 Add to MetaCart
This paper addresses three major issues associated with conventional partitional clustering, namely, sensitivity to initialization, difficulty in determining the number of clusters, and sensitivity to noise and outliers. The proposed Robust Competitive Agglomeration (RCA) algorithm starts with a large number of clusters to reduce the sensitivity to initialization, and determines the actual number of clusters by a process of competitive agglomeration. Noise immunity is achieved by incorporating concepts from robust statistics into the algorithm. RCA assigns two different sets of weights for each data point: the first set of constrained weights represents degrees of sharing, and is used to create a competitive environment and to generate a fuzzy partition of the data set. The second set corresponds to robust weights, and is used to obtain robust estimates of the cluster prototypes. By choosing an appropriate distance measure in the objective function, RCA can be used to find a...
Testing of Clustering
 In Proc. 41th Annu. IEEE Sympos. Found. Comput. Sci
, 2000
"... A set X of points in ! d is (k; b)clusterable if X can be partitioned into k subsets (clusters) so that the diameter (alternatively, the radius) of each cluster is at most b. We present algorithms that by sampling from a set X , distinguish between the case that X is (k; b)clusterable and the ca ..."
Abstract

Cited by 60 (13 self)
 Add to MetaCart
A set X of points in ! d is (k; b)clusterable if X can be partitioned into k subsets (clusters) so that the diameter (alternatively, the radius) of each cluster is at most b. We present algorithms that by sampling from a set X , distinguish between the case that X is (k; b)clusterable and the case that X is fflfar from being (k; b 0 )clusterable for any given 0 ! ffl 1 and for b 0 b. In fflfar from being (k; b 0 )clusterable we mean that more than ffl \Delta jX j points should be removed from X so that it becomes (k; b 0 )clusterable. We give algorithms for a variety of cost measures that use a sample of size independent of jX j, and polynomial in k and 1=ffl. Our algorithms can also be used to find approximately good clusterings. Namely, these are clusterings of all but an fflfraction of the points in X that have optimal (or close to optimal) cost. The benefit of our algorithms is that they construct an implicit representation of such clusterings in time independ...
Exact and Approximation Algorithms for Clustering
, 1997
"... In this paper we present a n O(k 1\Gamma1=d ) time algorithm for solving the kcenter problem in R d , under L1 and L 2 metrics. The algorithm extends to other metrics, and can be used to solve the discrete kcenter problem, as well. We also describe a simple (1 + ffl)approximation algorith ..."
Abstract

Cited by 57 (5 self)
 Add to MetaCart
In this paper we present a n O(k 1\Gamma1=d ) time algorithm for solving the kcenter problem in R d , under L1 and L 2 metrics. The algorithm extends to other metrics, and can be used to solve the discrete kcenter problem, as well. We also describe a simple (1 + ffl)approximation algorithm for the kcenter problem, with running time O(n log k) + (k=ffl) O(k 1\Gamma1=d ) . Finally, we present a n O(k 1\Gamma1=d ) time algorithm for solving the Lcapacitated kcenter problem, provided that L = \Omega\Gamma n=k 1\Gamma1=d ) or L = O(1). We conclude with a simple approximation algorithm for the Lcapacitated kcenter problem. The work on this paper was partially supported by a National Science Foundation Grant CCR9301259, by an Army Research Office MURI grant DAAH049610013, by a Sloan fellowship, by an NYI award and matching funds from Xerox Corporation, and by a grant from the U.S.Israeli Binational Science Foundation. y Department of Computer Science, Box ...
Fast Segmentation of Range Images into Planar Regions by Scan Line Grouping
 Machine Vision and Applications
, 1994
"... In this paper we present a novel technique for rapidly partitioning surfaces in range images into planar patches. Essential for our segmentation method is the observation that, in a scan line, the points belonging to a planar surface form a straight line segment. On the other hand, all points on a s ..."
Abstract

Cited by 50 (6 self)
 Add to MetaCart
In this paper we present a novel technique for rapidly partitioning surfaces in range images into planar patches. Essential for our segmentation method is the observation that, in a scan line, the points belonging to a planar surface form a straight line segment. On the other hand, all points on a straight line segment surely belong to the same planar surface. Based on this observation, we first divide each scan line into straight line segments and subsequently consider only the set of line segments of all scan lines as segmentation primitives. We have developed a simple linkbased data structure to efficiently represent line segments and their neighborhood relationship. The principle of our segmentation method is region growing. Three neighboring line segments satisfying an optimality criterion are selected as a seed region, and then a growing is carried out around the seed region. We use a noise variance estimation to automatically set some thresholds so that the algorithm can adapt ...