Results 1 - 10
of
14
Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships
"... The use of context is critical for scene understanding in computer vision, where the recognition of an object is driven by both local appearance and the object’s relationship to other elements of the scene (context). Most current approaches rely on modeling the relationships between object categorie ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
The use of context is critical for scene understanding in computer vision, where the recognition of an object is driven by both local appearance and the object’s relationship to other elements of the scene (context). Most current approaches rely on modeling the relationships between object categories as a source of context. In this paper we seek to move beyond categories to provide a richer appearancebased model of context. We present an exemplar-based model of objects and their relationships, the Visual Memex, that encodes both local appearance and 2D spatial context between object instances. We evaluate our model on Torralba’s proposed Context Challenge against a baseline category-based system. Our experiments suggest that moving beyond categories for context modeling appears to be quite beneficial, and may be the critical missing ingredient in scene understanding systems. 1
Understanding Images of Groups of People
"... In many social settings, images of groups of people are captured. The structure of this group provides meaningful context for reasoning about individuals in the group, and about the structure of the scene as a whole. For example, men are more likely to stand on the edge of an image than women. Inste ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
In many social settings, images of groups of people are captured. The structure of this group provides meaningful context for reasoning about individuals in the group, and about the structure of the scene as a whole. For example, men are more likely to stand on the edge of an image than women. Instead of treating each face independently from all others, we introduce contextual features that encapsulate the group structure locally (for each person in the group) and globally (the overall structure of the group). This “social context ” allows us to accomplish a variety of tasks, such as such as demographic recognition, calculating scene and camera parameters, and even event recognition. We perform human studies to show this context aids recognition of demographic information in images of strangers. 1.
Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models
"... In many machine learning domains (such as scene understanding), several related sub-tasks (such as scene categorization, depth estimation, object detection) operate on the same raw data and provide correlated outputs. Each of these tasks is often notoriously hard, and state-of-the-art classifiers al ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
In many machine learning domains (such as scene understanding), several related sub-tasks (such as scene categorization, depth estimation, object detection) operate on the same raw data and provide correlated outputs. Each of these tasks is often notoriously hard, and state-of-the-art classifiers already exist for many subtasks. It is desirable to have an algorithm that can capture such correlation without requiring to make any changes to the inner workings of any classifier. We propose Feedback Enabled Cascaded Classification Models (FE-CCM), that maximizes the joint likelihood of the sub-tasks, while requiring only a ‘black-box’ interface to the original classifier for each sub-task. We use a two-layer cascade of classifiers, which are repeated instantiations of the original ones, with the output of the first layer fed into the second layer as input. Our training method involves a feedback step that allows later classifiers to provide earlier classifiers information about what error modes to focus on. We show that our method significantly improves performance in all the sub-tasks in two different domains: (i) scene understanding, where we consider depth estimation, scene categorization, event categorization, object detection, geometric labeling and saliency detection, and (ii) robotic grasping, where we consider grasp point detection and object classification. 1
Using stereo for object recognition
- In Accepted to appear in the proceedings of the IEEE International Conference of Robotics and Automation (ICRA
, 2010
"... Abstract — There has been significant progress recently in object recognition research, but many of the current approaches still fail for object classes with few distinctive features, and in settings with significant clutter and viewpoint variance. One such setting is visual search in mobile robotic ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Abstract — There has been significant progress recently in object recognition research, but many of the current approaches still fail for object classes with few distinctive features, and in settings with significant clutter and viewpoint variance. One such setting is visual search in mobile robotics, where tasks such as finding a mug or stapler require robust recognition. The focus of this paper is on integrating stereo vision with appearance based recognition to increase accuracy and efficiency. We propose a model that utilizes a chamfer-type silhouette classifier which is weighted by a prior on scale, which is robust to missing stereo depth information. Our approach is validated on a set of challenging indoor scenes containing mugs and shoes, where we find that priors remove a significant number of false positives, improving the average precision by 0.2 on each dataset. We additionally experiment with an additional classifer by Felzenszwalb et al.[1] to demonstrate the approach’s robustness. I.
Context by Region Ancestry
"... In this paper, we introduce a new approach for modeling visual context. For this purpose, we consider the leaves of a hierarchical segmentation tree as elementary units. Each leaf is described by features of its ancestral set, the regions on the path linking the leaf to the root. We construct region ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
In this paper, we introduce a new approach for modeling visual context. For this purpose, we consider the leaves of a hierarchical segmentation tree as elementary units. Each leaf is described by features of its ancestral set, the regions on the path linking the leaf to the root. We construct region trees by using a high-performance segmentation method. We then learn the importance of different descriptors (e.g. color, texture, shape) of the ancestors for classification. We report competitive results on the MSRC segmentation dataset and the MIT scene dataset, showing that region ancestry efficiently encodes information about discriminative parts, objects and scenes. 1.
Context Based Object Categorization: A Critical Survey
"... Abstract. The goal of object categorization is to locate and identify instances of an object category within an image. Recognizing an object in an image is difficult when images present occlusion, poor quality, noise or background clutter, and this task becomes even more challenging when many object ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract. The goal of object categorization is to locate and identify instances of an object category within an image. Recognizing an object in an image is difficult when images present occlusion, poor quality, noise or background clutter, and this task becomes even more challenging when many objects are present in the same scene. Several models for object categorization use appearance and context information from objects to improve recognition accuracy. Appearance information, based on visual cues, can successfully identify object classes up to a certain extent. Context information, based on the interaction among objects in the scene or on global scene statistics, can help successfully disambiguate appearance inputs in recognition tasks. In this work we review different approaches of using contextual information in the field of object categorization and discuss scalability, optimizations and possible future approaches. 1
Object-Graphs for Context-Aware Category Discovery
, 2009
"... How can knowing about some categories help us to discover new ones in unlabeled images? Unsupervised visual category discovery is useful to mine for recurring objects without human supervision, but existing methods assume no prior information and thus tend to perform poorly for cluttered scenes with ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
How can knowing about some categories help us to discover new ones in unlabeled images? Unsupervised visual category discovery is useful to mine for recurring objects without human supervision, but existing methods assume no prior information and thus tend to perform poorly for cluttered scenes with multiple objects. We propose to leverage knowledge about previously learned categories to enable more accurate discovery. We introduce a novel objectgraph descriptor to encode the layout of object-level cooccurrence patterns relative to an unfamiliar region, and show that by using it to model the interaction between an image’s known and unknown objects we can better detect new visual categories. Rather than mine for all categories from scratch, our method can continually identify new objects while drawing on useful cues from familiar ones. We evaluate our approach on benchmark datasets and demonstrate clear improvements in discovery over conventional purely appearance-based baselines. 1.
Beyond Trees: MRF Inference via Outer-Planar Decomposition
, 2010
"... Maximum a posteriori (MAP) inference in Markov Random Fields (MRFs) is an NP-hard problem, and thus research has focussed on either finding efficiently solvable subclasses (e.g. trees), or approximate algorithms (e.g. Loopy Belief Propagation (BP) and Tree-reweighted (TRW) methods). This paper prese ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Maximum a posteriori (MAP) inference in Markov Random Fields (MRFs) is an NP-hard problem, and thus research has focussed on either finding efficiently solvable subclasses (e.g. trees), or approximate algorithms (e.g. Loopy Belief Propagation (BP) and Tree-reweighted (TRW) methods). This paper presents a unifying perspective of these approximate techniques called “Decomposition Methods”. These are methods that decompose the given problem over a graph into tractable subproblems over subgraphs and then employ message passing over these subgraphs to merge the solutions of the subproblems into a global solution. This provides a new way of thinking about BP and TRW as successive steps in a hierarchy of decomposition methods. Using this framework, we take a principled first step towards extending this hierarchy beyond trees. We leverage a new class of graphs amenable to exact inference, called outerplanar graphs, and propose an approximate inference algorithm called Outer-Planar Decomposition (OPD). OPD is a strict generalization of BP and TRW, and contains both of them as special cases. Our experiments show that this extension beyond trees is indeed very powerful – OPD outperforms current state-of-art inference methods on hard non-submodular synthetic problems and is competitive on real computer vision applications.
Unsupervised Learning of Hierarchical Spatial Structures In Images
"... The visual world demonstrates organized spatial patterns, among objects or regions in a scene, object-parts in an object, and low-level features in object-parts. These classes of spatial structures are inherently hierarchical in nature. Although seemingly quite different these spatial patterns are s ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The visual world demonstrates organized spatial patterns, among objects or regions in a scene, object-parts in an object, and low-level features in object-parts. These classes of spatial structures are inherently hierarchical in nature. Although seemingly quite different these spatial patterns are simply manifestations of different levels in a hierarchy. In this work, we present a unified approach to unsupervised learning of hierarchical spatial structures from a collection of images. Ours is a hierarchical rule-based model capturing spatial patterns, where each rule is represented by a star-graph. We propose an unsupervised EMstyle algorithm to learn our model from a collection of images. We show that the inference problem of determining the set of learnt rules instantiated in an image is equivalent to finding the minimum-cost Steiner tree in a directed acyclic graph. We evaluate our approach on a diverse set of data sets of object categories, natural outdoor scenes and images from complex street scenes with multiple objects. 1.
Exemplar-based Representations for Object Detection, Association and Beyond
, 2011
"... for supporting my research all these years. Recognizing and reasoning about the objects found in an image is one of the key problems in computer vision. This thesis is based on the idea that in order to understand a novel object, it is often not enough to recognize the object category it belongs to ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
for supporting my research all these years. Recognizing and reasoning about the objects found in an image is one of the key problems in computer vision. This thesis is based on the idea that in order to understand a novel object, it is often not enough to recognize the object category it belongs to (i.e., answering “What is this?”). We argue that a more meaningful interpretation can be obtained by linking the input object with a similar representation in memory (i.e., asking “What is this like?”). In this thesis, we present a memory-based system for recognizing and interpreting objects in images by establishing visual associations between an input image and a large database of object exemplars. These visual associations can then be used to predict properties of the novel object which cannot be deduced solely from category membership (e.g., which way is it facing? what is its segmentation? is there a person sitting on it?). Part I of this thesis is dedicated to exemplar representations and algorithms for creating visual associations. We propose Local Distance Functions and Exemplar-SVMs,

