Results 1 - 10
of
56
Data Clustering: A Review
- ACM COMPUTING SURVEYS
, 1999
"... Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exp ..."
Abstract
-
Cited by 912 (9 self)
- Add to MetaCart
Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult problem combinatorially, and differences in assumptions and contexts in different communities has made the transfer of useful generic concepts and methodologies slow to occur. This paper presents an overview of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques, and identify cross-cutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval.
Automatically characterizing large scale program behavior
, 2002
"... Understanding program behavior is at the foundation of computer architecture and program optimization. Many pro-grams have wildly different behavior on even the very largest of scales (over the complete execution of the program). This realization has ramifications for many architectural and com-pile ..."
Abstract
-
Cited by 520 (39 self)
- Add to MetaCart
Understanding program behavior is at the foundation of computer architecture and program optimization. Many pro-grams have wildly different behavior on even the very largest of scales (over the complete execution of the program). This realization has ramifications for many architectural and com-piler techniques, from thread scheduling, to feedback directed optimizations, to the way programs are simulated. However, in order to take advantage of time-varying behavior, we.must first develop the analytical tools necessary to automatically and efficiently analyze program behavior over large sections of execution. Our goal is to develop automatic techniques that are ca-pable of finding and exploiting the Large Scale Behavior of programs (behavior seen over billions of instructions). The first step towards this goal is the development of a hardware independent metric that can concisely summarize the behav-ior of an arbitrary section of execution in a program. To this end we examine the use of Basic Block Vectors. We quantify the effectiveness of Basic Block Vectors in capturing program behavior across several different architectural met-rics, explore the large scale behavior of several programs, and develop a set of algorithms based on clustering capable of an-alyzing this behavior. We then demonstrate an application of this technology to automatically determine where to simulate for a program to help guide computer architecture research. 1.
Robust multiresolution estimation of parametric motion models
- Jal of Vis. Comm. and Image Representation
, 1995
"... This paper describes a method to estimate parametric motion models. Motivations for the use of such models are on one hand their efficiency, which has been demonstrated in numerous contexts such as estimation, segmentation, tracking and interpretation of motion, and on the other hand, their low comp ..."
Abstract
-
Cited by 220 (40 self)
- Add to MetaCart
This paper describes a method to estimate parametric motion models. Motivations for the use of such models are on one hand their efficiency, which has been demonstrated in numerous contexts such as estimation, segmentation, tracking and interpretation of motion, and on the other hand, their low computational cost compared to optical flow estimation. However, it is important to have the best accuracy for the estimated parameters, and to take into account the problem of multiple motion. We have therefore developed two robust estimators in a multiresolution framework. Numerical results support this approach, as validated by the use of these algorithms on complex sequences. 1
Approximation Algorithms for Projective Clustering
- Proceedings of the ACM SIGMOD International Conference on Management of data, Philadelphia
, 2000
"... We consider the following two instances of the projective clustering problem: Given a set S of n points in R d and an integer k ? 0; cover S by k hyper-strips (resp. hyper-cylinders) so that the maximum width of a hyper-strip (resp., the maximum diameter of a hyper-cylinder) is minimized. Let w ..."
Abstract
-
Cited by 196 (14 self)
- Add to MetaCart
We consider the following two instances of the projective clustering problem: Given a set S of n points in R d and an integer k ? 0; cover S by k hyper-strips (resp. hyper-cylinders) so that the maximum width of a hyper-strip (resp., the maximum diameter of a hyper-cylinder) is minimized. Let w be the smallest value so that S can be covered by k hyper-strips (resp. hyper-cylinders), each of width (resp. diameter) at most w : In the plane, the two problems are equivalent. It is NP-Hard to compute k planar strips of width even at most Cw ; for any constant C ? 0 [50]. This paper contains four main results related to projective clustering: (i) For d = 2, we present a randomized algorithm that computes O(k log k) strips of width at most 6w that cover S. Its expected running time is O(nk 2 log 4 n) if k 2 log k n; it also works for larger values of k, but then the expected running time is O(n 2=3 k 8=3 log 4 n). We also propose another algorithm that computes a c...
Robust Analysis of Feature Spaces: Color Image Segmentation
, 1997
"... A general technique for the recovery of significant image features is presented. The technique is basedon the mean shift algorithm, a simple nonparametric procedure for estimating density gradients. Drawbacks of the current methods (including robust clustering) are avoided. Featurespace of any natu ..."
Abstract
-
Cited by 152 (5 self)
- Add to MetaCart
A general technique for the recovery of significant image features is presented. The technique is basedon the mean shift algorithm, a simple nonparametric procedure for estimating density gradients. Drawbacks of the current methods (including robust clustering) are avoided. Featurespace of any naturecan beprocessed, and as an example, color image segmentation is discussed. The segmentation is completely autonomous, only its class is chosen by the user. Thus, the same program can produce a high quality edge image, or provide, by extracting all the significant colors, a preprocessor for content-based query systems. A 512 x 512 color image is analyzed in less than 10 seconds on a standard workstation. Gray level images are handled as color images having only the lightness coordinate.
Comparison of texture features based on gabor filters
- IEEE Trans. on Image Processing
"... Abstract—Texture features that are based on the local power spectrum obtained by a bank of Gabor filters are compared. The features differ in the type of nonlinear post-processing which is applied to the local power spectrum. The following features are considered: Gabor energy, complex moments, and ..."
Abstract
-
Cited by 71 (2 self)
- Add to MetaCart
Abstract—Texture features that are based on the local power spectrum obtained by a bank of Gabor filters are compared. The features differ in the type of nonlinear post-processing which is applied to the local power spectrum. The following features are considered: Gabor energy, complex moments, and grating cell operator features. The capability of the corresponding operators to produce distinct feature vector clusters for different textures is compared using two methods: the Fisher criterion and the classification result comparison. Both methods give consistent results. The grating cell operator gives the best discrimination and segmentation results. The texture detection capabilities of the operators and their robustness to nontexture features are also compared. The grating cell operator is the only one that selectively responds only to texture and does not give false response to nontexture features such as object contours. Index Terms—Classification, complex moments, discrimination,
A Robust Competitive Clustering Algorithm with Applications in Computer Vision
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1998
"... This paper addresses three major issues associated with conventional partitional clustering, namely, sensitivity to initialization, difficulty in determining the number of clusters, and sensitivity to noise and outliers. The proposed Robust Competitive Agglomeration (RCA) algorithm starts with a lar ..."
Abstract
-
Cited by 61 (3 self)
- Add to MetaCart
This paper addresses three major issues associated with conventional partitional clustering, namely, sensitivity to initialization, difficulty in determining the number of clusters, and sensitivity to noise and outliers. The proposed Robust Competitive Agglomeration (RCA) algorithm starts with a large number of clusters to reduce the sensitivity to initialization, and determines the actual number of clusters by a process of competitive agglomeration. Noise immunity is achieved by incorporating concepts from robust statistics into the algorithm. RCA assigns two different sets of weights for each data point: the first set of constrained weights represents degrees of sharing, and is used to create a competitive environment and to generate a fuzzy partition of the data set. The second set corresponds to robust weights, and is used to obtain robust estimates of the cluster prototypes. By choosing an appropriate distance measure in the objective function, RCA can be used to find a...
Testing of Clustering
- In Proc. 41th Annu. IEEE Sympos. Found. Comput. Sci
, 2000
"... A set X of points in ! d is (k; b)-clusterable if X can be partitioned into k subsets (clusters) so that the diameter (alternatively, the radius) of each cluster is at most b. We present algorithms that by sampling from a set X , distinguish between the case that X is (k; b)-clusterable and the ca ..."
Abstract
-
Cited by 51 (11 self)
- Add to MetaCart
A set X of points in ! d is (k; b)-clusterable if X can be partitioned into k subsets (clusters) so that the diameter (alternatively, the radius) of each cluster is at most b. We present algorithms that by sampling from a set X , distinguish between the case that X is (k; b)-clusterable and the case that X is ffl-far from being (k; b 0 )-clusterable for any given 0 ! ffl 1 and for b 0 b. In ffl-far from being (k; b 0 )-clusterable we mean that more than ffl \Delta jX j points should be removed from X so that it becomes (k; b 0 )-clusterable. We give algorithms for a variety of cost measures that use a sample of size independent of jX j, and polynomial in k and 1=ffl. Our algorithms can also be used to find approximately good clusterings. Namely, these are clusterings of all but an ffl-fraction of the points in X that have optimal (or close to optimal) cost. The benefit of our algorithms is that they construct an implicit representation of such clusterings in time independ...
Exact and Approximation Algorithms for Clustering
, 1997
"... In this paper we present a n O(k 1\Gamma1=d ) time algorithm for solving the k-center problem in R d , under L1 and L 2 metrics. The algorithm extends to other metrics, and can be used to solve the discrete k-center problem, as well. We also describe a simple (1 + ffl)-approximation algorith ..."
Abstract
-
Cited by 48 (4 self)
- Add to MetaCart
In this paper we present a n O(k 1\Gamma1=d ) time algorithm for solving the k-center problem in R d , under L1 and L 2 metrics. The algorithm extends to other metrics, and can be used to solve the discrete k-center problem, as well. We also describe a simple (1 + ffl)-approximation algorithm for the k-center problem, with running time O(n log k) + (k=ffl) O(k 1\Gamma1=d ) . Finally, we present a n O(k 1\Gamma1=d ) time algorithm for solving the L-capacitated k-center problem, provided that L = \Omega\Gamma n=k 1\Gamma1=d ) or L = O(1). We conclude with a simple approximation algorithm for the L-capacitated k-center problem. The work on this paper was partially supported by a National Science Foundation Grant CCR-93--01259, by an Army Research Office MURI grant DAAH04-96-1-0013, by a Sloan fellowship, by an NYI award and matching funds from Xerox Corporation, and by a grant from the U.S.-Israeli Binational Science Foundation. y Department of Computer Science, Box ...
Fast Segmentation of Range Images into Planar Regions by Scan Line Grouping
- Machine Vision and Applications
, 1994
"... In this paper we present a novel technique for rapidly partitioning surfaces in range images into planar patches. Essential for our segmentation method is the observation that, in a scan line, the points belonging to a planar surface form a straight line segment. On the other hand, all points on a s ..."
Abstract
-
Cited by 45 (6 self)
- Add to MetaCart
In this paper we present a novel technique for rapidly partitioning surfaces in range images into planar patches. Essential for our segmentation method is the observation that, in a scan line, the points belonging to a planar surface form a straight line segment. On the other hand, all points on a straight line segment surely belong to the same planar surface. Based on this observation, we first divide each scan line into straight line segments and subsequently consider only the set of line segments of all scan lines as segmentation primitives. We have developed a simple link-based data structure to efficiently represent line segments and their neighborhood relationship. The principle of our segmentation method is region growing. Three neighboring line segments satisfying an optimality criterion are selected as a seed region, and then a growing is carried out around the seed region. We use a noise variance estimation to automatically set some thresholds so that the algorithm can adapt ...

