Results 1 - 10
of
11
Combining top-down and bottom-up segmentation
- In Proceedings IEEE workshop on Perceptual Organization in Computer Vision, CVPR
, 2004
"... In this work we show how to combine bottom-up and topdown approaches into a single figure-ground segmentation process. This process provides accurate delineation of object boundaries that cannot be achieved by either the topdown or bottom-up approach alone. The top-down approach uses object represen ..."
Abstract
-
Cited by 103 (2 self)
- Add to MetaCart
In this work we show how to combine bottom-up and topdown approaches into a single figure-ground segmentation process. This process provides accurate delineation of object boundaries that cannot be achieved by either the topdown or bottom-up approach alone. The top-down approach uses object representation learned from examples to detect an object in a given input image and provide an approximation to its figure-ground segmentation. The bottomup approach uses image-based criteria to define coherent groups of pixels that are likely to belong together to either the figure or the background part. The combination provides a final segmentation that draws on the relative merits of both approaches: The result is as close as possible to the top-down approximation, but is also constrained by the bottom-up process to be consistent with significant image discontinuities. We construct a global cost function that represents these top-down and bottom-up requirements. We then show how the global minimum of this function can be efficiently found by applying the sum-product algorithm. This algorithm also provides a confidence map that can be used to identify image regions where additional top-down or bottom-up information may further improve the segmentation. Our experiments show that the results derived from the algorithm are superior to results given by a pure top-down or pure bottom-up approach. The scheme has broad applicability, enabling the combined use of a range of existing bottom-up and top-down segmentations. 1.
Towards Automatic Discovery of Object Categories
, 2000
"... We propose a method to learn heterogeneous models of object classes for visual recognition. The training images contain a preponderance of clutter and learning is unsupervised. Our models represent objects as probabilistic constellations of rigid parts (features). The variability within a class is r ..."
Abstract
-
Cited by 94 (7 self)
- Add to MetaCart
We propose a method to learn heterogeneous models of object classes for visual recognition. The training images contain a preponderance of clutter and learning is unsupervised. Our models represent objects as probabilistic constellations of rigid parts (features). The variability within a class is represented by a joint probability density function on the shape of the constellation and the appearance of the parts. Our method automatically identifies distinctive features in the training set. The set of model parameters is then learned using expectation maximization (see the companion paper [11] for details). When trained on different, unlabeled and unsegmented views of a class of objects, each component of the mixture model can adapt to represent a subset of the views. Similarly, different component models can also "specialize" on sub-classes of an object class. Experiments on images of human heads, leaves from different species of trees, and motor-cars demonstrate that the method works...
Object Detection Using the Statistics of Parts
, 2004
"... In this paper we describe a trainable object detector and its instantiations for detecting faces and cars at any size, location, and pose. To cope with variation in object orientation, the detector uses multiple classifiers, each spanning a different range of orientation. Each of these classifiers ..."
Abstract
-
Cited by 88 (2 self)
- Add to MetaCart
In this paper we describe a trainable object detector and its instantiations for detecting faces and cars at any size, location, and pose. To cope with variation in object orientation, the detector uses multiple classifiers, each spanning a different range of orientation. Each of these classifiers determines whether the object is present at a specified size within a fixed-size image window. To find the object at any location and size, these classifiers scan the image exhaustively. Each classifier is based on the statistics of localized parts. Each part is a transform from a subset of wavelet coefficients to a discrete set of values. Such parts are designed to capture various combinations of locality in space, frequency, and orientation. In building each classifier, we gathered the class-conditional statistics of these part values from representative samples of object and non-object images. We trained each classifier to minimize classification error on the training set by using Adaboost with Confidence-Weighted Predictions (Shapire and Singer, 1999). In detection, each classifier computes the part values within the image window and looks up their associated classconditional probabilities. The classifier then makes a decision by applying a likelihood ratio test. For efficiency, the classifier evaluates this likelihood ratio in stages. At each stage, the classifier compares the partial likelihood ratio to a threshold and makes a decision about whether to cease evaluation—labeling the input as non-object—or to continue further evaluation. The detector orders these stages of evaluation from a low-resolution to a high-resolution search of the image. Our trainable object detector achieves reliable and efficient detection of human faces and passenger cars with out-of-plane rotation.
A Computational Model for Visual Selection
- NEURAL COMPUTATION
, 1999
"... We propose a computational model for detecting and localizing instances from an object class in static grey level images. We divide detection into visual selection and final classification, concentrating on the former: Drastically reducing the number of candidate regions which require further, usual ..."
Abstract
-
Cited by 77 (14 self)
- Add to MetaCart
We propose a computational model for detecting and localizing instances from an object class in static grey level images. We divide detection into visual selection and final classification, concentrating on the former: Drastically reducing the number of candidate regions which require further, usually more intensive, processing, but with a minimum of computation and missed detections. Bottom-up processing is based on local groupings of edge fragments constrained by loose geometrical relationships. They have no a priori semantic or geometric interpretation. The role of training is to select special groupings which are moderately likely at certain places on the object but rare in the background. We show that the statistics in both populations are stable. The candidate regions are those which contain global arrangements of several local groupings. Whereas our model was not conceived to explain brain functions, it does cohere with evidence about the functions of neurons in V1 and V2, such ...
Coarse-to-Fine Face Detection
, 2001
"... We study visual selection: Detect and roughly localize all instances of a generic object class, such as a face, in a greyscale scene, measuring performance in terms of computation and false alarms. Our approach is sequential testing which is coarse-to-fine in both in the exploration of poses and th ..."
Abstract
-
Cited by 69 (11 self)
- Add to MetaCart
We study visual selection: Detect and roughly localize all instances of a generic object class, such as a face, in a greyscale scene, measuring performance in terms of computation and false alarms. Our approach is sequential testing which is coarse-to-fine in both in the exploration of poses and the representation of objects. All the tests are binary and indicate the presence or absence of loose spatial arrangements of oriented edge fragments. Starting from training examples, we recursively find larger and larger arrangements which are “decomposable,” which implies the probability of an arrangement appearing on an object decays slowly with its size. Detection means finding a sufficient number of arrangements of each size along a decreasing sequence of pose cells. At the beginning, the tests are simple and universal, accommodating many poses simultaneously, but the false alarm rate is relatively high. Eventually, the tests are more discriminating, but also more complex and dedicated to specific poses. As a result, the spatial distribution of processing is highly skewed and detection is rapid, but at the expense of (isolated) false alarms which, presumably, could be eliminated with localized, more intensive, processing.
Learning Optimized Features for Hierarchical Models of Invariant Object Recognition
, 2002
"... There is an ongoing debate over the capabilities of hierarchical neural feed-forward architectures for performing real-world invariant object recognition. Although a variety of hierarchical models exists, appropriate supervised and unsupervised learning methods are still an issue of intense rese ..."
Abstract
-
Cited by 56 (28 self)
- Add to MetaCart
There is an ongoing debate over the capabilities of hierarchical neural feed-forward architectures for performing real-world invariant object recognition. Although a variety of hierarchical models exists, appropriate supervised and unsupervised learning methods are still an issue of intense research. We propose a feedforward model for recognition that shares components like weightsharing, pooling stages, and competitive nonlinearities with earlier approaches, but focus on new methods for learning optimal featuredetecting cells in intermediate stages of the hierarchical network.
A Coarse-to-Fine Strategy for Multi-Class Shape Detection
, 2004
"... Multi-class shape detection, in the sense of recognizing and localizing instances from multiple shape classes, is formulated as a two-step process in which local indexing primes global interpretation. During indexing a list of instantiations (shape identities and poses) is compiled constrained only ..."
Abstract
-
Cited by 30 (8 self)
- Add to MetaCart
Multi-class shape detection, in the sense of recognizing and localizing instances from multiple shape classes, is formulated as a two-step process in which local indexing primes global interpretation. During indexing a list of instantiations (shape identities and poses) is compiled constrained only by no missed detections at the expense of false positives. Global information, such as expected relationships among poses, is incorporated afterward to remove ambiguities. This division is motivated by computational efficiency. In addition, indexing itself is organized as a coarse-to-fine search simultaneously in class and pose. This search can be interpreted as successive approximations to likelihood ratio tests arising from a simple (“naive Bayes”) statistical model for the edge maps extracted from the original images. The key to constructing efficient “hypothesis tests” for multiple classes and poses is local OR’ing; in particular, spread edges provide imprecise but common and locally invariant features. Natural tradeoffs then emerge between discrimination and the pattern of spreading. These are analyzed mathematically within the model-based framework and the whole procedure is illustrated by experiments in reading license plates.
An Integrated Network for Invariant Visual Detection and Recognition
- VISION RESEARCH
, 2003
"... We describe an architecture for invariant visual detection and recognition. Learning is performed in a single central module. The architecture makes use of copies of retinotopic layers of local features, with a particular design of inputs and outputs, that allows them to be primed either to atten ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
We describe an architecture for invariant visual detection and recognition. Learning is performed in a single central module. The architecture makes use of copies of retinotopic layers of local features, with a particular design of inputs and outputs, that allows them to be primed either to attend to a particular location, or to attend to a particular object representation. In the former
Unsupervised Learning of Models for Visual Object Class Recognition
, 1999
"... We present a method to learn object class models for the purpose of object recognition. We focus on a particular type of model where objects are represented as constellations of rigid features (parts). The variability within a class is represented by a joint probability density function (pdf) on the ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
We present a method to learn object class models for the purpose of object recognition. We focus on a particular type of model where objects are represented as constellations of rigid features (parts). The variability within a class is represented by a joint probability density function (pdf) on the shape of the constellation and the output of feature detectors. The pdf may be estimated from training data once a model structure (type and number of features) has been specied. The method automatically identies distinctive features in the training set and learns the statistical shape model. It is assumed that a set of generic feature detectors is available for the learning algorithm to choose from. The entire set of model parameters is learned using expectation maximization. 1 Introduction and Related Work We are interested in the problem of recognizing members of object classes, where we dene an object class as a collection of objects which share characteristic parts or features tha...
Coarse-to-Fine Visual Selection
, 1999
"... We study visual selection: Detect and roughly localize all instances of a generic object class, such as a face, in a greyscale scene, measuring performance in terms of computation and false alarms. Our approach is sequential testing which is coarse-to-fine in both in the exploration of poses and ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
We study visual selection: Detect and roughly localize all instances of a generic object class, such as a face, in a greyscale scene, measuring performance in terms of computation and false alarms. Our approach is sequential testing which is coarse-to-fine in both in the exploration of poses and the representation of objects. All the tests are all binary and indicate the presence or absence of loose spatial arrangements of oriented edge fragments. Starting from training examples, we recursively find larger and larger arrangements which are "decomposable," which implies the probability of an arrangement appearing on an object decays slowly with its size. Detection means finding a sufficient number of arrangements of each size along a decreasing sequence of pose cells. At the beginning, the tests are simple and universal, accommodating many poses simultaneously, but the false alarm rate is relatively high. Eventually, the tests are more discriminating, but also more complex a...

