Results 1 - 10
of
44
Toward Objective Evaluation of Image Segmentation Algorithms
, 2007
"... Unsupervised image segmentation is an important component in many image understanding algorithms and practical vision systems. However, evaluation of segmentation algorithms thus far has been largely subjective, leaving a system designer to judge the effectiveness of a technique based only on intui ..."
Abstract
-
Cited by 40 (2 self)
- Add to MetaCart
Unsupervised image segmentation is an important component in many image understanding algorithms and practical vision systems. However, evaluation of segmentation algorithms thus far has been largely subjective, leaving a system designer to judge the effectiveness of a technique based only on intuition and results in the form of a few example segmented images. This is largely due to image segmentation being an ill-defined problem—there is no unique ground-truth segmentation of an image against which the output of an algorithm may be compared. This paper demonstrates how a recently proposed measure of similarity, the Normalized Probabilistic Rand (NPR) index, can be used to perform a quantitative comparison between image segmentation algorithms using a hand-labeled set of ground-truth segmentations. We show that the measure allows principled comparisons between segmentations created by different algorithms, as well as segmentations on different images. We outline a procedure for algorithm evaluation through an example evaluation of some familiar algorithms—the mean-shift-based algorithm, an efficient graph-based segmentation algorithm, a hybrid algorithm that combines the strengths of both methods, and expectation maximization. Results are presented on the 300 images in the publicly available Berkeley Segmentation Data Set.
From Contours to Regions: An Empirical Evaluation
"... We propose a generic grouping algorithm that constructs a hierarchy of regions from the output of any contour detector. Our method consists of two steps, an Oriented Watershed Transform (OWT) to form initial regions from contours, followed by construction of an Ultrametric Contour Map (UCM) defining ..."
Abstract
-
Cited by 40 (6 self)
- Add to MetaCart
We propose a generic grouping algorithm that constructs a hierarchy of regions from the output of any contour detector. Our method consists of two steps, an Oriented Watershed Transform (OWT) to form initial regions from contours, followed by construction of an Ultrametric Contour Map (UCM) definingahierarchicalsegmentation. We provideextensive experimentalevaluationtodemonstratethat, when coupled to a high-performance contour detector, the OWT-UCM algorithm produces state-of-the-art image segmentations. These hierarchical segmentations can optionally be further refined by user-specified annotations.
Approximate Clustering without the Approximation
"... Approximation algorithms for clustering points in metric spaces is a flourishing area of research, with much research effort spent on getting a better understanding of the approximation guarantees possible for many objective functions such as k-median, k-means, and min-sum clustering. This quest for ..."
Abstract
-
Cited by 22 (14 self)
- Add to MetaCart
Approximation algorithms for clustering points in metric spaces is a flourishing area of research, with much research effort spent on getting a better understanding of the approximation guarantees possible for many objective functions such as k-median, k-means, and min-sum clustering. This quest for better approximation algorithms is further fueled by the implicit hope that these better approximations also give us more accurate clusterings. E.g., for many problems such as clustering proteins by function, or clustering images by subject, there is some unknown “correct” target clustering and the implicit hope is that approximately optimizing these objective functions will in fact produce a clustering that is close (in symmetric difference) to the truth. In this paper, we show that if we make this implicit assumption explicit—that is, if we assume that any c-approximation to the given clustering objective F is ǫ-close to the target—then we can produce clusterings that are O(ǫ)-close to the target, even for values c for which obtaining a c-approximation is NP-hard. In particular, for k-median and k-means objectives, we show that we can achieve this guarantee for any constant c> 1, and for min-sum objective we can do this for any constant c> 2. Our results also highlight a somewhat surprising conceptual difference between assuming that the optimal solution to, say, the k-median objective is ǫ-close to the target, and assuming that any approximately optimal solution is ǫ-close to the target, even for approximation factor say c = 1.01. In the former case, the problem of finding a solution that is O(ǫ)-close to the target remains computationally hard, and yet for the latter we have an efficient algorithm.
Unsupervised Segmentation of Natural Images via Lossy Data Compression
, 2007
"... In this paper, we cast natural-image segmentation as a problem of clustering texture features as multivariate mixed data. We model the distribution of the texture features using a mixture of Gaussian distributions. Unlike most existing clustering methods, we allow the mixture components to be degene ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
In this paper, we cast natural-image segmentation as a problem of clustering texture features as multivariate mixed data. We model the distribution of the texture features using a mixture of Gaussian distributions. Unlike most existing clustering methods, we allow the mixture components to be degenerate or nearly-degenerate. We contend that this assumption is particularly important for mid-level image segmentation, where degeneracy is typically introduced by using a common feature representation for different textures in an image. We show that such a mixture distribution can be effectively segmented by a simple agglomerative clustering algorithm derived from a lossy data compression approach. Using either 2D texture filter banks or simple fixed-size windows to obtain texture features, the algorithm effectively segments an image by minimizing the overall coding length of the feature vectors. We conduct comprehensive experiments to measure the performance of the algorithm in terms of visual evaluation and a variety of quantitative indices for image segmentation. The algorithm compares favorably against other well-known image-segmentation methods on the Berkeley image database.
Graph partitioning by spectral rounding: Applications in image segmentation and clustering
- In CVPR
, 2006
"... We introduce a family of spectral partitioning methods. Edge separators of a graph are produced by iteratively reweighting the edges until the graph disconnects into the prescribed number of components. At each iteration a small number of eigenvectors with small eigenvalue are computed and used to d ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
We introduce a family of spectral partitioning methods. Edge separators of a graph are produced by iteratively reweighting the edges until the graph disconnects into the prescribed number of components. At each iteration a small number of eigenvectors with small eigenvalue are computed and used to determine the reweighting. In this way spectral rounding directly produces discrete solutions where as current spectral algorithms must map the continuous eigenvectors to discrete solutions by employing a heuristic geometric separator (e.g. k-means). We show that spectral rounding compares favorably to current spectral approximations on the Normalized Cut criterion (NCut). Results are given for natural image segmentation, medical image segmentation, and clustering. A practical version is shown to converge. 1.
The uniqueness of a good optimum for k-means
- ACM International Conference Proceeding Series
"... If we have found a ”good ” clustering C of a data set, can we prove that C is not far from the (unknown) best clustering C opt of these data? Perhaps surprisingly, the answer to this question is sometimes yes. When “goodness ” is measured by the distortion of K-means clustering, this paper proves sp ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
If we have found a ”good ” clustering C of a data set, can we prove that C is not far from the (unknown) best clustering C opt of these data? Perhaps surprisingly, the answer to this question is sometimes yes. When “goodness ” is measured by the distortion of K-means clustering, this paper proves spectral bounds on the distance d(C, C opt). The bounds exist in the case when the data admits a low distortion clustering. 1.
Dynamic Non-Parametric Mixture Models and The Recurrent Chinese Restaurant Process: with Applications to Evolutionary Clustering
"... Clustering is an important data mining task for exploration and visualization of different data types like news stories, scientific publications, weblogs, etc. Due to the evolving nature of these data, evolutionary clustering, also known as dynamic clustering, has recently emerged to cope with the c ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Clustering is an important data mining task for exploration and visualization of different data types like news stories, scientific publications, weblogs, etc. Due to the evolving nature of these data, evolutionary clustering, also known as dynamic clustering, has recently emerged to cope with the challenges of mining temporally smooth clusters over time. A good evolutionary clustering algorithm should be able to fit the data well at each time epoch, and at the same time results in a smooth cluster evolution that provides the data analyst with a coherent and easily interpretable model. In this paper we introduce the temporal Dirichlet process mixture model (TDPM) as a framework for evolutionary clustering. TDPM is a generalization of the DPM framework for clustering that automatically grows the number of clusters with the data. In our framework, the data is divided into epochs; all data points inside the same epoch are assumed to be fully exchangeable, whereas the temporal order is maintained across epochs. Moreover, The number of clusters in each epoch is unbounded: the clusters can retain, die out or emerge over time, and the actual parameterization of each cluster can also evolve over time in a Markovian fashion. We give a detailed and intuitive construction of this framework using the recurrent Chinese restaurant process (RCRP) metaphor, as well as a Gibbs sampling algorithm to carry out posterior inference in order to determine the optimal cluster evolution. We demonstrate our model over simulated data by using it to build an infinite dynamic mixture of Gaussian factors, and over real dataset by using it to build a simple non-parametric dynamic clustering-topic model and apply it to analyze the NIPS12 document collection.
Finding low error clusterings
- In COLT, 2009. 12 [BBV08] [BCR01] Maria-Florina Balcan, Avrim Blum, and Anupam
, 2009
"... A common approach for solving clustering problems is to design algorithms to approximately optimize various objective functions (e.g., k-means or min-sum) defined in terms of some given pairwise distance or similarity information. However, in many learning motivated clustering applications (such as ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
A common approach for solving clustering problems is to design algorithms to approximately optimize various objective functions (e.g., k-means or min-sum) defined in terms of some given pairwise distance or similarity information. However, in many learning motivated clustering applications (such as clustering proteins by function) there is some unknown target clustering; in such cases the pairwise information is merely based on heuristics and the real goal is to achieve low error on the data. In these settings, an arbitrary c-approximation algorithm for some objective would work well only if any c-approximation to that objective is close to the target clustering. In recent work, Balcan et. al [7] have shown how both for the k-means and k-median objectives this property allows one to produce clusterings of low error, even for values c such that getting a c-approximation to these objective functions is provably NP-hard. In this paper we analyze the min-sum objective from this perspective. While [7] also considered the min-sum problem, the results they derived for this objective were substantially weaker. In this work we derive new and more subtle structural properties for min-sum in this context and use these to design efficient algorithms for producing accurate clusterings, both in the transductive and in the inductive case. We also analyze the correlation clustering problem from this perspective, and point out interesting differences between this objective and k-median, k-means, or min-sum objectives. 1
Gold standard based ontology evaluation using instance assignment
- IN: PROC. OF THE EON 2006 WORKSHOP
, 2006
"... An ontology is an explicit formal conceptualization of some domain of interest. Ontology evaluation is the problem of assessing a given ontology from the point of view of a particular criterion or application, typically in order to determine which of several ontologies would best suit a particular p ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
An ontology is an explicit formal conceptualization of some domain of interest. Ontology evaluation is the problem of assessing a given ontology from the point of view of a particular criterion or application, typically in order to determine which of several ontologies would best suit a particular purpose. This paper proposes an ontology evaluation approach based on comparing an ontology to a gold standard ontology, assuming that both ontologies are constructed over the same set of instances.
L.Xu. Regularized spectral learning
- Proceedings of the Artificial Intelligence and Statistics Workshop(AISTATS 05
, 2005
"... Spectral clustering is a technique for finding groups in data consisting of similarities Sij between pairs of points. We approach the problem of learning the similarity as a function of other observed features, in order to optimize spectral clustering results on future data. This paper formulates a ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Spectral clustering is a technique for finding groups in data consisting of similarities Sij between pairs of points. We approach the problem of learning the similarity as a function of other observed features, in order to optimize spectral clustering results on future data. This paper formulates a new objective for learning in spectral clustering, that balances a clustering accuracy term, the gap, and a stability term, the eigengap with the later in the role of a regularizer. We derive an algorithm to optimize this objective, and semiautomatic methods to chose the optimal regularization. Preliminary experiments confirm the validity of the approach. 1

