Results 1 - 10
of
33
Object Detection with Discriminatively Trained Part Based Models
"... We describe an object detection system based on mixtures of multiscale deformable part models. Our system is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges. While deformable part models have become quite popular, their ..."
Abstract
-
Cited by 171 (14 self)
- Add to MetaCart
We describe an object detection system based on mixtures of multiscale deformable part models. Our system is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges. While deformable part models have become quite popular, their value had not been demonstrated on difficult benchmarks such as the PASCAL datasets. Our system relies on new methods for discriminative training with partially labeled data. We combine a margin-sensitive approach for data-mining hard negative examples with a formalism we call latent SVM. A latent SVM is a reformulation of MI-SVM in terms of latent variables. A latent SVM is semi-convex and the training problem becomes convex once latent information is specified for the positive examples. This leads to an iterative training algorithm that alternates between fixing latent values for positive examples and optimizing the latent SVM objective function.
A discriminatively trained, multiscale, deformable part model
- In IEEE Conference on Computer Vision and Pattern Recognition (CVPR-2008
, 2008
"... This paper describes a discriminatively trained, multiscale, deformable part model for object detection. Our system achieves a two-fold improvement in average precision over the best performance in the 2006 PASCAL person detection challenge. It also outperforms the best results in the 2007 challenge ..."
Abstract
-
Cited by 155 (8 self)
- Add to MetaCart
This paper describes a discriminatively trained, multiscale, deformable part model for object detection. Our system achieves a two-fold improvement in average precision over the best performance in the 2006 PASCAL person detection challenge. It also outperforms the best results in the 2007 challenge in ten out of twenty categories. The system relies heavily on deformable parts. While deformable part models have become quite popular, their value had not been demonstrated on difficult benchmarks such as the PASCAL challenge. Our system also relies heavily on new methods for discriminative training. We combine a margin-sensitive approach for data mining hard negative examples with a formalism we call latent SVM. A latent SVM, like a hidden CRF, leads to a non-convex training problem. However, a latent SVM is semi-convex and the training problem becomes convex once latent information is specified for the positive examples. We believe that our training methods will eventually make possible the effective use of more latent information such as hierarchical (grammar) models and models involving latent three dimensional pose. 1.
A Stochastic Grammar of Images
- Foundations and Trends in Computer Graphics and Vision
, 2006
"... This exploratory paper quests for a stochastic and context sensitive grammar of images. The grammar should achieve the following four objectives and thus serves as a unified framework of representation, learning, and recognition for a large number of object categories. (i) The grammar represents bot ..."
Abstract
-
Cited by 38 (8 self)
- Add to MetaCart
This exploratory paper quests for a stochastic and context sensitive grammar of images. The grammar should achieve the following four objectives and thus serves as a unified framework of representation, learning, and recognition for a large number of object categories. (i) The grammar represents both the hierarchical decompositions from scenes, to objects, parts, primitives and pixels by terminal and non-terminal nodes and the contexts for spatial and functional relations by horizontal links between the nodes. It formulates each object category as the set of all possible valid configurations produced by the grammar. (ii) The grammar is embodied in a simple And–Or graph representation where each Or-node points to alternative sub-configurations and an And-node is decomposed into a number of components. This representation supports recursive top-down/bottom-up procedures for image parsing under the Bayesian framework and make it convenient to scale
Efros. Unsupervised discovery of visual object class hierarchies
- In Proc. CVPR
, 2008
"... Objects in the world can be arranged into a hierarchy based on their semantic meaning (e.g. organism – animal – feline – cat). What about defining a hierarchy based on the visual appearance of objects? This paper investigates ways to automatically discover a hierarchical structure for the visual wor ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
Objects in the world can be arranged into a hierarchy based on their semantic meaning (e.g. organism – animal – feline – cat). What about defining a hierarchy based on the visual appearance of objects? This paper investigates ways to automatically discover a hierarchical structure for the visual world from a collection of unlabeled images. Previous approaches for unsupervised object and scene discovery focused on partitioning the visual data into a set of nonoverlapping classes of equal granularity. In this work, we propose to group visual objects using a multi-layer hierarchy tree that is based on common visual elements. This is achieved by adapting to the visual domain the generative Hierarchical Latent Dirichlet Allocation (hLDA) model previously used for unsupervised discovery of topic hierarchies in text. Images are modeled using quantized local image regions as analogues to words in text. Employing the multiple segmentation framework of Russell et al. [22], we show that meaningful object hierarchies, together with object segmentations, can be automatically learned from unlabeled and unsegmented image collections without supervision. We demonstrate improved object classification and localization performance using hLDA over the previous non-hierarchical method on the MSRC dataset [33]. 1.
Describing Visual Scenes Using Transformed Objects and Parts
- INT J COMPUT VIS
, 2005
"... We develop hierarchical, probabilistic models for objects, the parts composing them, and the visual scenes surrounding them. Our approach couples topic models originally developed for text analysis with spatial transformations, and thus consistently accounts for geometric constraints. By building i ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
We develop hierarchical, probabilistic models for objects, the parts composing them, and the visual scenes surrounding them. Our approach couples topic models originally developed for text analysis with spatial transformations, and thus consistently accounts for geometric constraints. By building integrated scene models, we may discover contextual relationships, and better exploit partially labeled training images. We first consider images of isolated objects, and show that sharing parts among object categories improves detection accuracy when learning from few examples. Turning to multiple object scenes, we propose nonparametric models which use Dirichlet processes to automatically learn the number of parts underlying each object category, and objects composing each scene. The resulting transformed Dirichlet process (TDP) leads to Monte Carlo algorithms which simultaneously segment and recognize objects in street and office scenes.
The generalized A* architecture
- Journal of Artificial Intelligence Research
"... We consider the problem of computing a lightest derivation of a global structure using a set of weighted rules. A large variety of inference problems in AI can be formulated in this framework. We generalize A * search and heuristics derived from abstractions to a broad class of lightest derivation p ..."
Abstract
-
Cited by 24 (6 self)
- Add to MetaCart
We consider the problem of computing a lightest derivation of a global structure using a set of weighted rules. A large variety of inference problems in AI can be formulated in this framework. We generalize A * search and heuristics derived from abstractions to a broad class of lightest derivation problems. We also describe a new algorithm that searches for lightest derivations using a hierarchy of abstractions. Our generalization of A * gives a new algorithm for searching AND/OR graphs in a bottom-up fashion. We discuss how the algorithms described here provide a general architecture for addressing the pipeline problem — the problem of passing information back and forth between various stages of processing in a perceptual system. We consider examples in computer vision and natural language processing. We apply the hierarchical search algorithm to the problem of estimating the boundaries of convex objects in grayscale images and compare it to other search methods. A second set of experiments demonstrate the use of a new compositional model for finding salient curves in images. 1.
Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing
- in Advances in Neural Information Processing Systems 19
, 2007
"... We introduce a Probabilistic Grammar-Markov Model (PGMM) which couples probabilistic context free grammars and Markov Random Fields. These PGMMs are generative models defined over attributed features and are used to detect and classify objects in natural images. PGMMs are designed so that they can p ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
We introduce a Probabilistic Grammar-Markov Model (PGMM) which couples probabilistic context free grammars and Markov Random Fields. These PGMMs are generative models defined over attributed features and are used to detect and classify objects in natural images. PGMMs are designed so that they can perform rapid inference, parameter learning, and the more difficult task of structure induction. PGMMs can deal with unknown 2D pose (position, orientation, and scale) in both inference and learning, different appearances, or aspects, of the model. The PGMMs can be learnt in an unsupervised manner where the image can contain one of an unknown number of objects of different categories or even be pure background. We first study the weakly supervised case, where each image contains an example of the (single) object of interest, and then generalize to less supervised cases. The goal of this paper is theoretical but, to provide proof of concept, we demonstrate results from this approach on a subset of the Caltech dataset (learning on a training set and evaluating on a testing set). Our results are generally comparable with the current state of the art, and our inference is performed in less than five seconds.
Deconvolutional networks
- In CVPR
, 2010
"... Building robust low and mid-level image representations, beyond edge primitives, is a long-standing goal in vision. Many existing feature detectors spatially pool edge information which destroys cues such as edge intersections, parallelism and symmetry. We present a learning framework where features ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
Building robust low and mid-level image representations, beyond edge primitives, is a long-standing goal in vision. Many existing feature detectors spatially pool edge information which destroys cues such as edge intersections, parallelism and symmetry. We present a learning framework where features that capture these mid-level cues spontaneously emerge from image data. Our approach is based on the convolutional decomposition of images under a sparsity constraint and is totally unsupervised. By building a hierarchy of such decompositions we can learn rich feature sets that are a robust image representation for both the analysis and synthesis of images. 1.
Learning the taxonomy and models of categories present in arbitrary images
- In ICCV
, 2007
"... This paper proposes, and presents a solution to, the problem of simultaneous learning of multiple visual categories present in an arbitrary image set and their intercategory relationships. These relationships, also called their taxonomy, allow categories to be defined recursively, as spatial configu ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
This paper proposes, and presents a solution to, the problem of simultaneous learning of multiple visual categories present in an arbitrary image set and their intercategory relationships. These relationships, also called their taxonomy, allow categories to be defined recursively, as spatial configurations of (simpler) subcategories each of which may be shared by many categories. Each image is represented by a segmentation tree, whose structure captures recursive embedding of image regions in a multiscale segmentation, and whose nodes contain the associated region properties. The presence of any occurring categories is reflected in the occurrence of associated, similar subtrees within the image trees. Similar subtrees across the entire image set are clustered. Each cluster corresponds to a discovered category, represented by the cluster properties. A (subcategory) cluster of small matching subtrees may occur within multiple clusters (categories) of larger matching subtrees, in different spatial relationships with subtrees from other small clusters. Such recursive embedding, grouping and intersection of clusters is captured in a directed acyclic graph (DAG) which represents the discovered taxonomy. Detection, recognition and segmentation of any of the learned categories present in a new image are simultaneously conducted by matching the segmentation tree of the new image with the learned DAG. This matching also yields a semantic explanation of the recognized category, in terms of the presence of its subcategories. Experiments with a newly compiled dataset of four-legged animals demonstrate good cross-category resolvability. 1.
Learning the Compositional Nature of Visual Objects
"... The compositional nature of visual objects significantly limits their representation complexity and renders learning of structured object models tractable. Adopting this modeling strategy we both (i) automatically decompose objects into a hierarchy of relevant compositions and we (ii) learn such a c ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
The compositional nature of visual objects significantly limits their representation complexity and renders learning of structured object models tractable. Adopting this modeling strategy we both (i) automatically decompose objects into a hierarchy of relevant compositions and we (ii) learn such a compositional representation for each category without supervision. The compositional structure supports feature sharing already on the lowest level of small image patches. Compositions are represented as probability distributions over their constituent parts and the relations between them. The global shape of objects is captured by a graphical model which combines all compositions. Inference based on the underlying statistical model is then employed to obtain a category level object recognition system. Experiments on large standard benchmark datasets underline the competitive recognition performance of this approach and they provide insights into the learned compositional structure of objects. 1.

