• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Nonparametric scene parsing: label transfer via dense scene alignment (2009)

by C Liu, J Yuen, A Torralba
Venue:In CVPR
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 21
Next 10 →

Region-based Segmentation and Object Detection

by Stephen Gould, Tianshi Gao, Daphne Koller
"... Object detection and multi-class image segmentation are two closely related tasks that can be greatly improved when solved jointly by feeding information from one task to the other [10, 11]. However, current state-of-the-art models use a separate representation for each task making joint inference c ..."
Abstract - Cited by 10 (1 self) - Add to MetaCart
Object detection and multi-class image segmentation are two closely related tasks that can be greatly improved when solved jointly by feeding information from one task to the other [10, 11]. However, current state-of-the-art models use a separate representation for each task making joint inference clumsy and leaving the classification of many parts of the scene ambiguous. In this work, we propose a hierarchical region-based approach to joint object detection and image segmentation. Our approach simultaneously reasons about pixels, regions and objects in a coherent probabilistic model. Pixel appearance features allow us to perform well on classifying amorphous background classes, while the explicit representation of regions facilitate the computation of more sophisticated features necessary for object detection. Importantly, our model gives a single unified description of the scene—we explain every pixel in the image and enforce global consistency between all random variables in our model. We run experiments on the challenging Street Scene dataset [2] and show significant improvement over state-of-the-art results for object detection accuracy. 1

SuperParsing: Scalable Nonparametric Image Parsing with Superpixels

by Joseph Tighe, Svetlana Lazebnik
"... Abstract. This paper presents a simple and effective nonparametric approach to the problem of image parsing, or labeling image regions (in our case, superpixels produced by bottom-up segmentation) with their categories. This approach requires no training, and it can easily scale to datasets with ten ..."
Abstract - Cited by 10 (0 self) - Add to MetaCart
Abstract. This paper presents a simple and effective nonparametric approach to the problem of image parsing, or labeling image regions (in our case, superpixels produced by bottom-up segmentation) with their categories. This approach requires no training, and it can easily scale to datasets with tens of thousands of images and hundreds of labels. It works by scene-level matching with global image descriptors, followed by superpixel-level matching with local features and efficient Markov random field (MRF) optimization for incorporating neighborhood context. Our MRF setup can also compute a simultaneous labeling of image regions into semantic classes (e.g., tree, building, car) and geometric classes (sky, vertical, ground). Our system outperforms the state-of-the-art nonparametric method based on SIFT Flow on a dataset of 2,688 images and 33 labels. In addition, we report per-pixel rates on a larger dataset of 15,150 images and 170 labels. To our knowledge, this is the first complete evaluation of image parsing on a dataset of this size, and it establishes a new benchmark for the problem. Key words: scene understanding, image parsing, image segmentation 1

Segmenting Scenes by Matching Image Composites

by Bryan C. Russell, Alexei A. Efros, Josef Sivic, William T. Freeman, Andrew Zisserman - APPEARED AT ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS , 2009
"... In this paper, we investigate how, given an image, similar images sharing the same global description can help with unsupervised scene segmentation. In contrast to recent work in semantic alignment of scenes, we allow an input image to be explained by partial matches of similar scenes. This allows f ..."
Abstract - Cited by 8 (2 self) - Add to MetaCart
In this paper, we investigate how, given an image, similar images sharing the same global description can help with unsupervised scene segmentation. In contrast to recent work in semantic alignment of scenes, we allow an input image to be explained by partial matches of similar scenes. This allows for a better explanation of the input scenes. We perform MRF-based segmentation that optimizes over matches, while respecting boundary information. The recovered segments are then used to re-query a large database of images to retrieve better matches for the target regions. We show improved performance in detecting the principal occluding and contact boundaries for the scene over previous methods on data gathered from the LabelMe database.

SIFT Flow: Dense Correspondence across Scenes and its Applications

by Ce Liu, Jenny Yuen, Antonio Torralba
"... While image alignment has been studied in different areas of computer vision for decades, aligning images depicting different scenes remains a challenging problem. Analogous to optical flow where an image is aligned to its temporally adjacent frame, we propose SIFT flow, a method to align an image ..."
Abstract - Cited by 7 (2 self) - Add to MetaCart
While image alignment has been studied in different areas of computer vision for decades, aligning images depicting different scenes remains a challenging problem. Analogous to optical flow where an image is aligned to its temporally adjacent frame, we propose SIFT flow, a method to align an image to its nearest neighbors in a large image corpus containing a variety of scenes. The SIFT flow algorithm consists of matching densely sampled, pixel-wise SIFT features between two images, while preserving spatial discontinuities. The SIFT features allow robust matching across different scene/object appearances, whereas the discontinuitypreserving spatial model allows matching of objects located at different parts of the scene. Experiments show that the proposed approach robustly aligns complex scene pairs containing significant spatial differences. Based on SIFT flow, we propose an alignmentbased large database framework for image analysis and synthesis, where image information is transferred from the nearest neighbors to a query image according to the dense scene correspondence. This framework is demonstrated through concrete applications, such as motion field prediction from a single image, motion synthesis via object transfer, satellite image registration and face recognition.

Ensemble of Exemplar-SVMs for Object Detection and Beyond

by Tomasz Malisiewicz, Abhinav Gupta, Alexei A. Efros
"... This paper proposes a conceptually simple but surprisingly powerful method which combines the effectiveness of a discriminative object detector with the explicit correspondence offered by a nearest-neighbor approach. The method is based on training a separate linear SVM classifier for every exemplar ..."
Abstract - Cited by 6 (4 self) - Add to MetaCart
This paper proposes a conceptually simple but surprisingly powerful method which combines the effectiveness of a discriminative object detector with the explicit correspondence offered by a nearest-neighbor approach. The method is based on training a separate linear SVM classifier for every exemplar in the training set. Each of these Exemplar-SVMs is thus defined by a single positive instance and millions of negatives. While each detector is quite specific to its exemplar, we empirically observe that an ensemble of such Exemplar-SVMs offers surprisingly good generalization. Our performance on the PASCAL VOC detection task is on par with the much more complex latent part-based model of Felzenszwalb et al., at only a modest computational cost increase. But the central benefit of our approach is that it creates an explicit association between each detection and a single training exemplar. Because most detections show good alignment to their associated exemplar, it is possible to transfer any available exemplar meta-data (segmentation, geometric structure, 3D model, etc.) directly onto the detections, which can then be used as part of overall scene understanding. 1.

Label propagation in video sequences

by Vijay Badrinarayanan, Fabio Galasso, Roberto Cipolla , 2010
"... This paper proposes a probabilistic graphical model for the problem of propagating labels in video sequences, also termed the label propagation problem. Given a limited amount of hand labelled pixels, typically the start and end frames of a chunk of video, an EM based algorithm propagates labels thr ..."
Abstract - Cited by 5 (2 self) - Add to MetaCart
This paper proposes a probabilistic graphical model for the problem of propagating labels in video sequences, also termed the label propagation problem. Given a limited amount of hand labelled pixels, typically the start and end frames of a chunk of video, an EM based algorithm propagates labels through the rest of the frames of the video sequence. As a result, the user obtains pixelwise labelled video sequences along with the class probabilities at each pixel. Our novel algorithm provides an essential tool to reduce tedious hand labelling of video sequences, thus producing copious amounts of useable ground truth data. A novel application of this algorithm is in semi-supervised learning of discriminative classifiers for video segmentation and scene parsing. The label propagation scheme can be based on pixelwise correspondences obtained from motion estimation, image patch based similarities as seen in epitomic models or even the more recent, semantically consistent hierarchical regions. We compare the abilities of each of these variants, both via quantitative and qualitative studies against ground truth data. We then report studies on a state of the art Random forest classifier based video segmentation scheme, trained using fully ground truth data and with data obtained from label propagation. The results of this study strongly support and encourage the use of the proposed label propagation algorithm. 1.

Semantic Segmentation of Urban Scenes Using Dense Depth Maps

by Chenxi Zhang, Liang Wang, Ruigang Yang
"... Abstract. In this paper we present a framework for semantic scene parsing and object recognition based on dense depth maps. Five viewindependent 3D features that vary with object class are extracted from dense depth maps at a superpixel level for training a classifier using randomized decision fores ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Abstract. In this paper we present a framework for semantic scene parsing and object recognition based on dense depth maps. Five viewindependent 3D features that vary with object class are extracted from dense depth maps at a superpixel level for training a classifier using randomized decision forest technique. Our formulation integrates multiple features in a Markov Random Field (MRF) framework to segment and recognize different object classes in query street scene images. We evaluate our method both quantitatively and qualitatively on the challenging Cambridge-driving Labeled Video Database (CamVid). The result shows that only using dense depth information, we can achieve overall better accurate segmentation and recognition than that from sparse 3D features or appearance, or even the combination of sparse 3D features and appearance, advancing state-of-the-art performance. Furthermore, by aligning 3D dense depth based features into a unified coordinate frame, our algorithm can handle the special case of view changes between training and testing scenarios. Preliminary evaluation in cross training and testing shows promising results. 1

A data-driven approach for event prediction

by Jenny Yuen, Antonio Torralba
"... Abstract. When given a single static picture, humans can not only interpret the instantaneous content captured by the image, but also they are able to infer the chain of dynamic events that are likely to happen in the near future. Similarly, when a human observes a short video, it is easy to decide ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Abstract. When given a single static picture, humans can not only interpret the instantaneous content captured by the image, but also they are able to infer the chain of dynamic events that are likely to happen in the near future. Similarly, when a human observes a short video, it is easy to decide if the event taking place in the video is normal or unexpected, even if the video depicts a an unfamiliar place for the viewer. This is in contrast with work in surveillance and outlier event detection, where the models rely on thousands of hours of video recorded at a single place in order to identify what constitutes an unusual event. In this work we present a simple method to identify videos with unusual events in a large collection of short video clips. The algorithm is inspired by recent approaches in computer vision that rely on large databases. In this work we show how, relying on large collections of videos, we can retrieve other videos similar to the query to build a simple model of the distribution of expected motions for the query. Consequently, the model can evaluate how unusual is the video as well as make event predictions. We show how a very simple retrieval model is able to provide reliable results. 1

Modeling and Analysis of Dynamic Behaviors of Web Image Collections

by Gunhee Kim, Eric P. Xing, Antonio Torralba
"... Abstract. Can we model the temporal evolution of topics in Web image collections? If so, can we exploit the understanding of dynamics to solve novel visual problems or improve recognition performance? These two challenging questions are the motivation for this work. We propose a nonparametric approa ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
Abstract. Can we model the temporal evolution of topics in Web image collections? If so, can we exploit the understanding of dynamics to solve novel visual problems or improve recognition performance? These two challenging questions are the motivation for this work. We propose a nonparametric approach to modeling and analysis of topical evolution in image sets. A scalable and parallelizable sequential Monte Carlo based method is developed to construct the similarity network of a large-scale dataset that provides a base representation for wide ranges of dynamics analysis. In this paper, we provide several experimental results to support the usefulness of image dynamics with the datasets of 47 topics gathered from Flickr. First, we produce some interesting observations such as tracking of subtopic evolution and outbreak detection, which cannot be achieved with conventional image sets. Second, we also present the complementary benefits that the images can introduce over the associated text analysis. Finally, we show that the training using the temporal association significantly improves the recognition performance. 1

Exemplar-based Representations for Object Detection, Association and Beyond

by Tomasz Malisiewicz , 2011
"... for supporting my research all these years. Recognizing and reasoning about the objects found in an image is one of the key problems in computer vision. This thesis is based on the idea that in order to understand a novel object, it is often not enough to recognize the object category it belongs to ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
for supporting my research all these years. Recognizing and reasoning about the objects found in an image is one of the key problems in computer vision. This thesis is based on the idea that in order to understand a novel object, it is often not enough to recognize the object category it belongs to (i.e., answering “What is this?”). We argue that a more meaningful interpretation can be obtained by linking the input object with a similar representation in memory (i.e., asking “What is this like?”). In this thesis, we present a memory-based system for recognizing and interpreting objects in images by establishing visual associations between an input image and a large database of object exemplars. These visual associations can then be used to predict properties of the novel object which cannot be deduced solely from category membership (e.g., which way is it facing? what is its segmentation? is there a person sitting on it?). Part I of this thesis is dedicated to exemplar representations and algorithms for creating visual associations. We propose Local Distance Functions and Exemplar-SVMs,
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University