Results 1  10
of
61
From Contours to Regions: An Empirical Evaluation
"... We propose a generic grouping algorithm that constructs a hierarchy of regions from the output of any contour detector. Our method consists of two steps, an Oriented Watershed Transform (OWT) to form initial regions from contours, followed by construction of an Ultrametric Contour Map (UCM) defining ..."
Abstract

Cited by 82 (6 self)
 Add to MetaCart
We propose a generic grouping algorithm that constructs a hierarchy of regions from the output of any contour detector. Our method consists of two steps, an Oriented Watershed Transform (OWT) to form initial regions from contours, followed by construction of an Ultrametric Contour Map (UCM) definingahierarchicalsegmentation. We provideextensive experimentalevaluationtodemonstratethat, when coupled to a highperformance contour detector, the OWTUCM algorithm produces stateoftheart image segmentations. These hierarchical segmentations can optionally be further refined by userspecified annotations.
Toward Objective Evaluation of Image Segmentation Algorithms
, 2007
"... Unsupervised image segmentation is an important component in many image understanding algorithms and practical vision systems. However, evaluation of segmentation algorithms thus far has been largely subjective, leaving a system designer to judge the effectiveness of a technique based only on intui ..."
Abstract

Cited by 78 (2 self)
 Add to MetaCart
Unsupervised image segmentation is an important component in many image understanding algorithms and practical vision systems. However, evaluation of segmentation algorithms thus far has been largely subjective, leaving a system designer to judge the effectiveness of a technique based only on intuition and results in the form of a few example segmented images. This is largely due to image segmentation being an illdefined problem—there is no unique groundtruth segmentation of an image against which the output of an algorithm may be compared. This paper demonstrates how a recently proposed measure of similarity, the Normalized Probabilistic Rand (NPR) index, can be used to perform a quantitative comparison between image segmentation algorithms using a handlabeled set of groundtruth segmentations. We show that the measure allows principled comparisons between segmentations created by different algorithms, as well as segmentations on different images. We outline a procedure for algorithm evaluation through an example evaluation of some familiar algorithms—the meanshiftbased algorithm, an efficient graphbased segmentation algorithm, a hybrid algorithm that combines the strengths of both methods, and expectation maximization. Results are presented on the 300 images in the publicly available Berkeley Segmentation Data Set.
Approximate Clustering without the Approximation
"... Approximation algorithms for clustering points in metric spaces is a flourishing area of research, with much research effort spent on getting a better understanding of the approximation guarantees possible for many objective functions such as kmedian, kmeans, and minsum clustering. This quest for ..."
Abstract

Cited by 35 (18 self)
 Add to MetaCart
Approximation algorithms for clustering points in metric spaces is a flourishing area of research, with much research effort spent on getting a better understanding of the approximation guarantees possible for many objective functions such as kmedian, kmeans, and minsum clustering. This quest for better approximation algorithms is further fueled by the implicit hope that these better approximations also give us more accurate clusterings. E.g., for many problems such as clustering proteins by function, or clustering images by subject, there is some unknown “correct” target clustering and the implicit hope is that approximately optimizing these objective functions will in fact produce a clustering that is close (in symmetric difference) to the truth. In this paper, we show that if we make this implicit assumption explicit—that is, if we assume that any capproximation to the given clustering objective F is ǫclose to the target—then we can produce clusterings that are O(ǫ)close to the target, even for values c for which obtaining a capproximation is NPhard. In particular, for kmedian and kmeans objectives, we show that we can achieve this guarantee for any constant c> 1, and for minsum objective we can do this for any constant c> 2. Our results also highlight a somewhat surprising conceptual difference between assuming that the optimal solution to, say, the kmedian objective is ǫclose to the target, and assuming that any approximately optimal solution is ǫclose to the target, even for approximation factor say c = 1.01. In the former case, the problem of finding a solution that is O(ǫ)close to the target remains computationally hard, and yet for the latter we have an efficient algorithm.
Unsupervised Segmentation of Natural Images via Lossy Data Compression
, 2007
"... In this paper, we cast naturalimage segmentation as a problem of clustering texture features as multivariate mixed data. We model the distribution of the texture features using a mixture of Gaussian distributions. Unlike most existing clustering methods, we allow the mixture components to be degene ..."
Abstract

Cited by 32 (2 self)
 Add to MetaCart
In this paper, we cast naturalimage segmentation as a problem of clustering texture features as multivariate mixed data. We model the distribution of the texture features using a mixture of Gaussian distributions. Unlike most existing clustering methods, we allow the mixture components to be degenerate or nearlydegenerate. We contend that this assumption is particularly important for midlevel image segmentation, where degeneracy is typically introduced by using a common feature representation for different textures in an image. We show that such a mixture distribution can be effectively segmented by a simple agglomerative clustering algorithm derived from a lossy data compression approach. Using either 2D texture filter banks or simple fixedsize windows to obtain texture features, the algorithm effectively segments an image by minimizing the overall coding length of the feature vectors. We conduct comprehensive experiments to measure the performance of the algorithm in terms of visual evaluation and a variety of quantitative indices for image segmentation. The algorithm compares favorably against other wellknown imagesegmentation methods on the Berkeley image database.
Graph partitioning by spectral rounding: Applications in image segmentation and clustering
 In CVPR
, 2006
"... We introduce a family of spectral partitioning methods. Edge separators of a graph are produced by iteratively reweighting the edges until the graph disconnects into the prescribed number of components. At each iteration a small number of eigenvectors with small eigenvalue are computed and used to d ..."
Abstract

Cited by 25 (3 self)
 Add to MetaCart
We introduce a family of spectral partitioning methods. Edge separators of a graph are produced by iteratively reweighting the edges until the graph disconnects into the prescribed number of components. At each iteration a small number of eigenvectors with small eigenvalue are computed and used to determine the reweighting. In this way spectral rounding directly produces discrete solutions where as current spectral algorithms must map the continuous eigenvectors to discrete solutions by employing a heuristic geometric separator (e.g. kmeans). We show that spectral rounding compares favorably to current spectral approximations on the Normalized Cut criterion (NCut). Results are given for natural image segmentation, medical image segmentation, and clustering. A practical version is shown to converge. 1.
Dynamic NonParametric Mixture Models and The Recurrent Chinese Restaurant Process: with Applications to Evolutionary Clustering
"... Clustering is an important data mining task for exploration and visualization of different data types like news stories, scientific publications, weblogs, etc. Due to the evolving nature of these data, evolutionary clustering, also known as dynamic clustering, has recently emerged to cope with the c ..."
Abstract

Cited by 23 (5 self)
 Add to MetaCart
Clustering is an important data mining task for exploration and visualization of different data types like news stories, scientific publications, weblogs, etc. Due to the evolving nature of these data, evolutionary clustering, also known as dynamic clustering, has recently emerged to cope with the challenges of mining temporally smooth clusters over time. A good evolutionary clustering algorithm should be able to fit the data well at each time epoch, and at the same time results in a smooth cluster evolution that provides the data analyst with a coherent and easily interpretable model. In this paper we introduce the temporal Dirichlet process mixture model (TDPM) as a framework for evolutionary clustering. TDPM is a generalization of the DPM framework for clustering that automatically grows the number of clusters with the data. In our framework, the data is divided into epochs; all data points inside the same epoch are assumed to be fully exchangeable, whereas the temporal order is maintained across epochs. Moreover, The number of clusters in each epoch is unbounded: the clusters can retain, die out or emerge over time, and the actual parameterization of each cluster can also evolve over time in a Markovian fashion. We give a detailed and intuitive construction of this framework using the recurrent Chinese restaurant process (RCRP) metaphor, as well as a Gibbs sampling algorithm to carry out posterior inference in order to determine the optimal cluster evolution. We demonstrate our model over simulated data by using it to build an infinite dynamic mixture of Gaussian factors, and over real dataset by using it to build a simple nonparametric dynamic clusteringtopic model and apply it to analyze the NIPS12 document collection.
The uniqueness of a good optimum for kmeans
 In ICML
, 2006
"... If we have found a ”good ” clustering C of a data set, can we prove that C is not far from the (unknown) best clustering Copt of these data? Perhaps surprisingly, the answer to this question is sometimes yes. When “goodness ” is measured by the distortion of Kmeans clustering, this paper proves spe ..."
Abstract

Cited by 21 (3 self)
 Add to MetaCart
If we have found a ”good ” clustering C of a data set, can we prove that C is not far from the (unknown) best clustering Copt of these data? Perhaps surprisingly, the answer to this question is sometimes yes. When “goodness ” is measured by the distortion of Kmeans clustering, this paper proves spectral bounds on the distance d(C, Copt). The bounds exist in the case when the data admits a low distortion clustering. 1.
Gold standard based ontology evaluation using instance assignment
 IN: PROC. OF THE EON 2006 WORKSHOP
, 2006
"... An ontology is an explicit formal conceptualization of some domain of interest. Ontology evaluation is the problem of assessing a given ontology from the point of view of a particular criterion or application, typically in order to determine which of several ontologies would best suit a particular p ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
An ontology is an explicit formal conceptualization of some domain of interest. Ontology evaluation is the problem of assessing a given ontology from the point of view of a particular criterion or application, typically in order to determine which of several ontologies would best suit a particular purpose. This paper proposes an ontology evaluation approach based on comparing an ontology to a gold standard ontology, assuming that both ontologies are constructed over the same set of instances.
Finding low error clusterings
 In COLT, 2009. 12 [BBV08] [BCR01] MariaFlorina Balcan, Avrim Blum, and Anupam
, 2009
"... A common approach for solving clustering problems is to design algorithms to approximately optimize various objective functions (e.g., kmeans or minsum) defined in terms of some given pairwise distance or similarity information. However, in many learning motivated clustering applications (such as ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
A common approach for solving clustering problems is to design algorithms to approximately optimize various objective functions (e.g., kmeans or minsum) defined in terms of some given pairwise distance or similarity information. However, in many learning motivated clustering applications (such as clustering proteins by function) there is some unknown target clustering; in such cases the pairwise information is merely based on heuristics and the real goal is to achieve low error on the data. In these settings, an arbitrary capproximation algorithm for some objective would work well only if any capproximation to that objective is close to the target clustering. In recent work, Balcan et. al [7] have shown how both for the kmeans and kmedian objectives this property allows one to produce clusterings of low error, even for values c such that getting a capproximation to these objective functions is provably NPhard. In this paper we analyze the minsum objective from this perspective. While [7] also considered the minsum problem, the results they derived for this objective were substantially weaker. In this work we derive new and more subtle structural properties for minsum in this context and use these to design efficient algorithms for producing accurate clusterings, both in the transductive and in the inductive case. We also analyze the correlation clustering problem from this perspective, and point out interesting differences between this objective and kmedian, kmeans, or minsum objectives. 1
Natural Image Segmentation with Adaptive Texture and Boundary Encoding
, 2009
"... We present a novel algorithm for unsupervised segmentation of natural images that harnesses the principle of minimum description length (MDL). Our method is based on observations that a homogeneously textured region of a natural image can be well modeled by a Gaussian distribution and the region bou ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
We present a novel algorithm for unsupervised segmentation of natural images that harnesses the principle of minimum description length (MDL). Our method is based on observations that a homogeneously textured region of a natural image can be well modeled by a Gaussian distribution and the region boundary can be effectively coded by an adaptive chain code. The optimal segmentation of an image is the one that gives the shortest coding length for encoding all textures and boundaries in the image, and is obtained via an agglomerative clustering process applied to a hierarchy of decreasing window sizes. The optimal segmentation also provides an accurate estimate of the overall coding length and hence the true entropy of the image. We test our algorithm on two publicly available databases: Berkeley Segmentation Dataset and MSRC Object Recognition Database. It achieves stateoftheart segmentation results compared to other popular methods. 1