Results 1  10
of
174
Learning to detect natural image boundaries using local brightness, color, and texture cues
 PAMI
, 2004
"... Abstract—The goal of this work is to accurately detect and localize boundaries in natural scenes using local image measurements. We formulate features that respond to characteristic changes in brightness, color, and texture associated with natural boundaries. In order to combine the information from ..."
Abstract

Cited by 406 (16 self)
 Add to MetaCart
Abstract—The goal of this work is to accurately detect and localize boundaries in natural scenes using local image measurements. We formulate features that respond to characteristic changes in brightness, color, and texture associated with natural boundaries. In order to combine the information from these features in an optimal way, we train a classifier using human labeled images as ground truth. The output of this classifier provides the posterior probability of a boundary at each image location and orientation. We present precisionrecall curves showing that the resulting detector significantly outperforms existing approaches. Our two main results are 1) that cue combination can be performed adequately with a simple linear model and 2) that a proper, explicit treatment of texture is required to detect boundaries in natural images. Index Terms—Texture, supervised learning, cue combination, natural images, ground truth segmentation data set, boundary detection, boundary localization. 1
Image retrieval: ideas, influences, and trends of the new age
 ACM COMPUTING SURVEYS
, 2008
"... We have witnessed great interest and a wealth of promise in contentbased image retrieval as an emerging technology. While the last decade laid foundation to such promise, it also paved the way for a large number of new techniques and systems, got many new people involved, and triggered stronger ass ..."
Abstract

Cited by 269 (8 self)
 Add to MetaCart
We have witnessed great interest and a wealth of promise in contentbased image retrieval as an emerging technology. While the last decade laid foundation to such promise, it also paved the way for a large number of new techniques and systems, got many new people involved, and triggered stronger association of weakly related fields. In this article, we survey almost 300 key theoretical and empirical contributions in the current decade related to image retrieval and automatic image annotation, and in the process discuss the spawning of related subfields. We also discuss significant challenges involved in the adaptation of existing image retrieval techniques to build systems that can be useful in the real world. In retrospect of what has been achieved so far, we also conjecture what the future may hold for image retrieval research.
An Introduction to MCMC for Machine Learning
, 2003
"... This purpose of this introductory paper is threefold. First, it introduces the Monte Carlo method with emphasis on probabilistic machine learning. Second, it reviews the main building blocks of modern Markov chain Monte Carlo simulation, thereby providing and introduction to the remaining papers of ..."
Abstract

Cited by 221 (2 self)
 Add to MetaCart
This purpose of this introductory paper is threefold. First, it introduces the Monte Carlo method with emphasis on probabilistic machine learning. Second, it reviews the main building blocks of modern Markov chain Monte Carlo simulation, thereby providing and introduction to the remaining papers of this special issue. Lastly, it discusses new interesting research horizons.
Random walks for image segmentation
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2006
"... Abstract—A novel method is proposed for performing multilabel, interactive image segmentation. Given a small number of pixels with userdefined (or predefined) labels, one can analytically and quickly determine the probability that a random walker starting at each unlabeled pixel will first reach on ..."
Abstract

Cited by 215 (18 self)
 Add to MetaCart
Abstract—A novel method is proposed for performing multilabel, interactive image segmentation. Given a small number of pixels with userdefined (or predefined) labels, one can analytically and quickly determine the probability that a random walker starting at each unlabeled pixel will first reach one of the prelabeled pixels. By assigning each pixel to the label for which the greatest probability is calculated, a highquality image segmentation may be obtained. Theoretical properties of this algorithm are developed along with the corresponding connections to discrete potential theory and electrical circuits. This algorithm is formulated in discrete space (i.e., on a graph) using combinatorial analogues of standard operators and principles from continuous potential theory, allowing it to be applied in arbitrary dimension on arbitrary graphs. Index Terms—Image segmentation, interactive segmentation, graph theory, random walks, combinatorial Dirichlet problem, harmonic functions, Laplace equation, graph cuts, boundary completion. Ç 1
Hierarchical Bayesian Inference in the Visual Cortex
, 2002
"... this paper, we propose a Bayesian theory of hierarchical cortical computation based both on (a) the mathematical and computational ideas of computer vision and pattern the ory and on (b) recent neurophysiological experimental evidence. We ,2 have proposed that Grenander's pattern theory 3 could pot ..."
Abstract

Cited by 172 (0 self)
 Add to MetaCart
this paper, we propose a Bayesian theory of hierarchical cortical computation based both on (a) the mathematical and computational ideas of computer vision and pattern the ory and on (b) recent neurophysiological experimental evidence. We ,2 have proposed that Grenander's pattern theory 3 could potentially model the brain as a generafive model in such a way that feedback serves to disambiguate and 'explain away' the earlier representa tion. The Helmholtz machine 4, 5 was an excellent step towards approximating this proposal, with feedback implementing priors. Its development, however, was rather limited, dealing only with binary images. Moreover, its feedback mechanisms were engaged only during the learning of the feedforward connections but not during perceptual inference, though the Gibbs sampling process for inference can potentially be interpreted as topdown feedback disambiguating low level representations? Rao and Ballard's predictive coding/Kalman filter model 6 did integrate generafive feedback in the perceptual inference process, but it was primarily a linear model and thus severely limited in practical utility. The datadriven Markov Chain Monte Carlo approach of Zhu and colleagues 7, 8 might be the most successful recent application of this proposal in solving real and difficult computer vision problems using generafive models, though its connection to the visual cortex has not been explored. Here, we bring in a powerful and widely applicable paradigm from artificial intelligence and computer vision to propose some new ideas about the algorithms of visual cortical process ing and the nature of representations in the visual cortex. We will review some of our and others' neurophysiological experimental data to lend support to these ideas
Image Parsing: Unifying Segmentation, Detection, and Recognition
, 2005
"... In this paper we present a Bayesian framework for parsing images into their constituent visual patterns. The parsing algorithm optimizes the posterior probability and outputs a scene representation in a "parsing graph", in a spirit similar to parsing sentences in speech and natural language. The ..."
Abstract

Cited by 159 (18 self)
 Add to MetaCart
In this paper we present a Bayesian framework for parsing images into their constituent visual patterns. The parsing algorithm optimizes the posterior probability and outputs a scene representation in a "parsing graph", in a spirit similar to parsing sentences in speech and natural language. The algorithm constructs the parsing graph and reconfigures it dynamically using a set of reversible Markov chain jumps. This computational framework integrates two popular inference approaches  generative (topdown) methods and discriminative (bottomup) methods. The former formulates the posterior probability in terms of generative models for images defined by likelihood functions and priors. The latter computes discriminative probabilities based on a sequence (cascade) of bottomup tests/filters.
Robust Higher Order Potentials for Enforcing Label Consistency
, 2009
"... This paper proposes a novel framework for labelling problems which is able to combine multiple segmentations in a principled manner. Our method is based on higher order conditional random fields and uses potentials defined on sets of pixels (image segments) generated using unsupervised segmentation ..."
Abstract

Cited by 136 (23 self)
 Add to MetaCart
This paper proposes a novel framework for labelling problems which is able to combine multiple segmentations in a principled manner. Our method is based on higher order conditional random fields and uses potentials defined on sets of pixels (image segments) generated using unsupervised segmentation algorithms. These potentials enforce label consistency in image regions and can be seen as a generalization of the commonly used pairwise contrast sensitive smoothness potentials. The higher order potential functions used in our framework take the form of the Robust P n model and are more general than the P n Potts model recently proposed by Kohli et al. We prove that the optimal swap and expansion moves for energy functions composed of these potentials can be computed by solving a stmincut problem. This enables the use of powerful graph cut based move making algorithms for performing inference in the framework. We test our method on the problem of multiclass object segmentation by augmenting the conventional CRF used for object segmentation with higher order potentials defined on image regions. Experiments on challenging data sets show that integration of higher order potentials quantitatively and qualitatively improves results leading to much better definition of object boundaries. We
MCMCbased particle filtering for tracking a variable number of interacting targets
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2005
"... We describe a particle filter that effectively deals with interacting targets targets that are influenced by the proximity and/or behavior of other targets. The particle filter includes a Markov random field (MRF) motion prior that helps maintain the identity of targets throughout an interaction, s ..."
Abstract

Cited by 116 (6 self)
 Add to MetaCart
We describe a particle filter that effectively deals with interacting targets targets that are influenced by the proximity and/or behavior of other targets. The particle filter includes a Markov random field (MRF) motion prior that helps maintain the identity of targets throughout an interaction, significantly reducing tracker failures. We show that this MRF prior can be easily implemented by including an additional interaction factor in the importance weights of the particle filter. However, the computational requirements of the resulting multitarget filter render it unusable for large numbers of targets. Consequently, we replace the traditional importance sampling step in the particle filter with a novel Markov chain Monte Carlo (MCMC) sampling step to obtain a more efficient MCMCbased multitarget filter. We also show how to extend this MCMCbased filter to address a variable number of interacting targets. Finally, we present both qualitative and quantitative experimental results, demonstrating that the resulting particle filters deal efficiently and effectively with complicated target interactions.
Detecting and reading text in natural scenes
 In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition
, 2004
"... This paper gives an algorithm for detecting and reading text in natural images. The algorithm is intended for use by blind and visually impaired subjects walking through city scenes. We first obtain a dataset of city images taken by blind and normally sighted subjects. From this dataset, we manually ..."
Abstract

Cited by 80 (2 self)
 Add to MetaCart
This paper gives an algorithm for detecting and reading text in natural images. The algorithm is intended for use by blind and visually impaired subjects walking through city scenes. We first obtain a dataset of city images taken by blind and normally sighted subjects. From this dataset, we manually label and extract the text regions. Next we perform statistical analysis of the text regions to determine which image features are reliable indicators of text and have low entropy (i.e. feature response is similar for all text images). We obtain weak classifiers by using joint probabilities for feature responses on and off text. These weak classifiers are used as input to an AdaBoost machine learning algorithm to train a strong classifier. In practice, we trained a cascade with 4 strong classifiers containg 79 features. An adaptive binarization and extension algorithm is applied to those regions selected by the cascade classifier. A commercial OCR software is used to read the text or reject it as a nontext region. The overall algorithm has a success rate of over 90% (evaluated by complete detection and reading of the text) on the test set and the unread text is typically small and distant from the viewer. 1.
A Stochastic Grammar of Images
 Foundations and Trends in Computer Graphics and Vision
, 2006
"... This exploratory paper quests for a stochastic and context sensitive grammar of images. The grammar should achieve the following four objectives and thus serves as a unified framework of representation, learning, and recognition for a large number of object categories. (i) The grammar represents bot ..."
Abstract

Cited by 80 (17 self)
 Add to MetaCart
This exploratory paper quests for a stochastic and context sensitive grammar of images. The grammar should achieve the following four objectives and thus serves as a unified framework of representation, learning, and recognition for a large number of object categories. (i) The grammar represents both the hierarchical decompositions from scenes, to objects, parts, primitives and pixels by terminal and nonterminal nodes and the contexts for spatial and functional relations by horizontal links between the nodes. It formulates each object category as the set of all possible valid configurations produced by the grammar. (ii) The grammar is embodied in a simple And–Or graph representation where each Ornode points to alternative subconfigurations and an Andnode is decomposed into a number of components. This representation supports recursive topdown/bottomup procedures for image parsing under the Bayesian framework and make it convenient to scale