• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. (2009)

by J Shotton, J Winn, C Rother, A Criminisi
Venue:IJCV,
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 217
Next 10 →

Semantic Texton Forests for Image Categorization and Segmentation

by Jamie Shotton, Matthew Johnson, Roberto Cipolla
"... We propose semantic texton forests, efficient and powerful new low-level features. These are ensembles of decision trees that act directly on image pixels, and therefore do not need the expensive computation of filter-bank responses or local descriptors. They are extremely fast to both train and tes ..."
Abstract - Cited by 304 (13 self) - Add to MetaCart
We propose semantic texton forests, efficient and powerful new low-level features. These are ensembles of decision trees that act directly on image pixels, and therefore do not need the expensive computation of filter-bank responses or local descriptors. They are extremely fast to both train and test, especially compared with k-means clustering and nearest-neighbor assignment of feature descriptors. The nodes in the trees provide (i) an implicit hierarchical clustering into semantic textons, and (ii) an explicit local classification estimate. Our second contribution, the bag of semantic textons, combines a histogram of semantic textons over an image region with a region prior category distribution. The bag of semantic textons is computed over the whole image for categorization, and over local rectangular regions for segmentation. Including both histogram and region prior allows our segmentation algorithm to exploit both textural and semantic context. Our third contribution is an image-level prior for segmentation that emphasizes those categories that the automatic categorization believes to be present. We evaluate on two datasets including the very challenging VOC 2007 segmentation dataset. Our results significantly advance the state-of-the-art in segmentation accuracy, and furthermore, our use of efficient decision forests gives at least a five-fold increase in execution speed.

Computer Vision: Algorithms and Applications

by Richard Szeliski , 2010
"... ..."
Abstract - Cited by 252 (2 self) - Add to MetaCart
Abstract not found

SLIC Superpixels Compared to State-of-the-Art Superpixel Methods

by Radhakrishna Achanta, Kevin Smith, Aurelien Lucchi, Pascal Fua - PAMI
"... Abstract—Computer vision applications have come to rely increasingly on superpixels in recent years, but it is not always clear what constitutes a good superpixel algorithm. In an effort to understand the benefits and drawbacks of existing methods, we empirically compare five state-of-the-art superp ..."
Abstract - Cited by 222 (3 self) - Add to MetaCart
Abstract—Computer vision applications have come to rely increasingly on superpixels in recent years, but it is not always clear what constitutes a good superpixel algorithm. In an effort to understand the benefits and drawbacks of existing methods, we empirically compare five state-of-the-art superpixel algorithms for their ability to adhere to image boundaries, speed, memory efficiency, and their impact on segmentation performance. We then introduce a new superpixel algorithm, simple linear iterative clustering (SLIC), which adapts a k-means clustering approach to efficiently generate superpixels. Despite its simplicity, SLIC adheres to boundaries as well as or better than previous methods. At the same time, it is faster and more memory efficient, improves segmentation performance, and is straightforward to extend to supervoxel generation. Index Terms—Superpixels, segmentation, clustering, k-means. I.
(Show Context)

Citation Context

...gly simple, SLIC is shown to yield stateof-the-art adherence to image boundaries on the Berkeley benchmark [20], and outperforms existing methods when used for segmentation on the PASCAL [7] and MSRC =-=[24]-=- data sets. Furthermore, it is faster and more memory efficient than existing methods. In addition to these quantifiable benefits, SLIC is easy to use, offers flexibility in the compactness and number...

Learning 3D mesh segmentation and labeling

by Evangelos Kalogerakis, Aaron Hertzmann, Karan Singh - ACM Trans. on Graphics , 2010
"... head torso upper arm lower arm hand upper leg lower leg foot ear head torso arm leg tail body fin handle cup top base arm lens bridge antenna head thorax leg abdomen cup handle face hair neck fin stabilizer body wing top leg thumb index middle ring pinky palm big roller medium roller axle handle joi ..."
Abstract - Cited by 101 (7 self) - Add to MetaCart
head torso upper arm lower arm hand upper leg lower leg foot ear head torso arm leg tail body fin handle cup top base arm lens bridge antenna head thorax leg abdomen cup handle face hair neck fin stabilizer body wing top leg thumb index middle ring pinky palm big roller medium roller axle handle joint jaws head neck torso leg tail ear head torso back upper arm lower arm hand upper leg lower leg foot tail head wing body leg tail big cube small cube back middle seat leg head tentacle Figure 1: Labeling and segmentation results from applying our algorithm to one mesh each from every category in the Princeton Segmentation Benchmark [Chen et al. 2009]. For each result, the algorithm was trained on the other meshes in the same class, e.g., the human was labeled after training on the other meshes in the human class. This paper presents a data-driven approach to simultaneous segmentation and labeling of parts in 3D meshes. An objective function is formulated as a Conditional Random Field model, with terms assessing the consistency of faces with labels, and terms between labels of neighboring faces. The objective function is learned from a collection of labeled training meshes. The algorithm uses hundreds of geometric and contextual label features and learns different types of segmentations for different tasks, without requiring manual parameter tuning. Our algorithm achieves a significant improvement in results over the state-of-the-art when evaluated on the Princeton Segmentation Benchmark, often producing segmentations and labelings comparable to those produced by humans. 1

Baby Talk: Understanding and Generating Simple Image Descriptions

by Girish Kulkarni, Visruth Premraj, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C Berg, Tamara L Berg
"... We posit that visually descriptive language offers computer vision researchers both information about the world, and information about how people describe the world. The potential benefit from this source is made more significant due to the enormous amount of language data easily available today. We ..."
Abstract - Cited by 82 (0 self) - Add to MetaCart
We posit that visually descriptive language offers computer vision researchers both information about the world, and information about how people describe the world. The potential benefit from this source is made more significant due to the enormous amount of language data easily available today. We present a system to automatically generate natural language descriptions from images that exploits both statistics gleaned from parsing large quantities of text data and recognition algorithms from computer vision. The system is very effective at producing relevant sentences for images. It also generates descriptions that are notably more true to the specific image content than previous work. 1.
(Show Context)

Citation Context

...onships between labeled parts – either detections or regions – of images was used to improve labeling accuracy, but the spatial relationships themselves were not considered outputs in their own right =-=[24, 7, 16, 21, 15]-=-. Estimates of spatial relationships between objects form an important part of the output of the computer vision aspect of our approach and are used to drive sentence generation. There is a great deal...

Nonparametric Scene Parsing via Label Transfer

by Ce Liu, Jenny Yuen, Antonio Torralba , 2011
"... While there has been a lot of recent work on object recognition and image understanding, the focus has been on carefully establishing mathematical models for images, scenes, and objects. In this paper, we propose a novel, nonparametric approach for object recognition and scene parsing using a new t ..."
Abstract - Cited by 66 (3 self) - Add to MetaCart
While there has been a lot of recent work on object recognition and image understanding, the focus has been on carefully establishing mathematical models for images, scenes, and objects. In this paper, we propose a novel, nonparametric approach for object recognition and scene parsing using a new technology we name label transfer. For an input image, our system first retrieves its nearest neighbors from a large database containing fully annotated images. Then, the system establishes dense correspondences between the input image and each of the nearest neighbors using the dense SIFT flow algorithm [28], which aligns two images based on local image structures. Finally, based on the dense scene correspondences obtained from SIFT flow, our system warps the existing annotations and integrates multiple cues in a Markov random field framework to segment and recognize the query image. Promising experimental results have been achieved by our nonparametric scene parsing system on challenging databases. Compared to existing object recognition approaches that require training classifiers or appearance models for each object category, our system is easy to implement, has few parameters, and embeds contextual information naturally in the retrieval/alignment procedure.
(Show Context)

Citation Context

...fer of labels from existing annotated images, rather than building a comprehensive object recognition system. We show, however, that the performance of our system outperforms existing approaches [8], =-=[43]-=- on our databases. Our code and databases can be downloaded at http://people.csail.mit.edu/celiu/LabelTransfer/. This paper is organized as follows: In Section 2, we briefly survey the object recognit...

Stacked Hierarchical Labeling

by Daniel Munoz, J. Andrew Bagnell, Martial Hebert
"... Abstract. In this work we propose a hierarchical approach for labeling semantic objects and regions in scenes. Our approach is reminiscent of early vision literature in that we use a decomposition of the image in order to encode relational and spatial information. In contrast to much existing work o ..."
Abstract - Cited by 63 (17 self) - Add to MetaCart
Abstract. In this work we propose a hierarchical approach for labeling semantic objects and regions in scenes. Our approach is reminiscent of early vision literature in that we use a decomposition of the image in order to encode relational and spatial information. In contrast to much existing work on structured prediction for scene understanding, we bypass a global probabilistic model and instead directly train a hierarchical inference procedure inspired by the message passing mechanics of some approximate inference procedures in graphical models. This approach mitigates both the theoretical and empirical difficulties of learning probabilistic models when exact inference is intractable. In particular, we draw from recent work in machine learning and break the complex inference process into a hierarchical series of simple machine learning subproblems. Each subproblem in the hierarchy is designed to capture the image and contextual statistics in the scene. This hierarchy spans coarse-to-fine regions and explicitly models the mixtures of semantic labels that may be present due to imperfect segmentation. To avoid cascading of errors and overfitting, we train the learning problems in sequence to ensure robustness to likely errors earlier in the inference sequence and leverage the stacking approach developed by Cohen et al. 1
(Show Context)

Citation Context

... but as Fig. 2 also illustrates, this assumption is not always true, especially for more complex scenes. With our hierarchical approach, we demonstrate state-of-the-art performance on SBD and MSRC-21 =-=[26]-=- with the added benefit of drastically simpler computations over global methods. 2 Background 2.1 Motivation Random field models in vision have proven to be an effective tool and are also attractive d...

Harmony Potentials for Joint Classification and Segmentation

by Josep M. Gonfaus, Xavier Boix, Joost Van De Weijer, Andrew D. Bagdanov Joan Serrat, Jordi Gonzàlez - In Conference on Computer Vision and Pattern Recognition , 2010
"... Hierarchical conditional random fields have been successfully applied to object segmentation. One reason is their ability to incorporate contextual information at different scales. However, these models do not allow multiple labels to be assigned to a single node. At higher scales in the image, this ..."
Abstract - Cited by 57 (2 self) - Add to MetaCart
Hierarchical conditional random fields have been successfully applied to object segmentation. One reason is their ability to incorporate contextual information at different scales. However, these models do not allow multiple labels to be assigned to a single node. At higher scales in the image, this yields an oversimplified model, since multiple classes can be reasonable expected to appear within one region. This simplified model especially limits the impact that observations at larger scales may have on the CRF model. Neglecting the information at larger scales is undesirable since class-label estimates based on these scales are more reliable than at smaller, noisier scales. To address this problem, we propose a new potential, called harmony potential, which can encode any possible combination of class labels. We propose an effective sampling strategy that renders tractable the underlying optimization problem. Results show that our approach obtains state-of-the-art results on two challenging datasets: Pascal VOC 2009 and MSRC-21. 1.
(Show Context)

Citation Context

...al scale considers the entire image. One of the most successful trends in object class image segmentation poses this labeling problem as one of energy minimization of a conditional random field (CRF) =-=[20, 6, 22]-=-. In this paper we also adopt this framework but focus on the crucial point of how to efficiently represent and ∗ Both authors contributed equally to this work. combine context at various scales. Repr...

Object Recognition as Ranking Holistic Figure-Ground Hypotheses

by Fuxin Li, Joao Carreira, Cristian Sminchisescu - In CVPR, 2010. 7
"... We present an approach to visual object-class recognition and segmentation based on a pipeline that combines multiple, holistic figure-ground hypotheses generated in a bottom-up, object independent process. Decisions are performed based on continuous estimates of the spatial overlap between image se ..."
Abstract - Cited by 55 (13 self) - Add to MetaCart
We present an approach to visual object-class recognition and segmentation based on a pipeline that combines multiple, holistic figure-ground hypotheses generated in a bottom-up, object independent process. Decisions are performed based on continuous estimates of the spatial overlap between image segment hypotheses and each putative class. We differ from existing approaches not only in our seemingly unreasonable assumption that good object-level segments can be obtained in a feed-forward fashion, but also in framing recognition as a regression problem. Instead of focusing on a one-vs-all winning margin that can scramble ordering inside the non-maximum (non-winning) set, learning produces a globally consistent ranking with close ties to segment quality, hence to the extent entire object or part hypotheses spatially overlap with the ground truth. We demonstrate results beyond the current state of the art for image classification, object detection and semantic segmentation, in a number of challenging datasets including Caltech-101, ETHZ-Shape and PASCAL VOC 2009. 1.
(Show Context)

Citation Context

...s the reliance on the structure of the hierarchical segmentation, which may not always be stable. Another set of bottom-up approaches decides the object category directly at the level of image pixels =-=[26, 53]-=-, or superpixels [18,20], based on features extracted over a supporting neighborhood. Textonboost [53] classifies each pixel using a linear predictor on texton-layout features, learned using boosting....

Track to the Future: Spatio-temporal Video Segmentation with Long-range Motion Cues

by José Lezama, Karteek Alahari, Josef Sivic, Ivan Laptev, École Normale, Supérieure Cachan
"... Video provides not only rich visual cues such as motion and appearance, but also much less explored long-range temporal interactions among objects. We aim to capture such interactions and to construct a powerful intermediatelevel video representation for subsequent recognition. Motivated by this goa ..."
Abstract - Cited by 52 (2 self) - Add to MetaCart
Video provides not only rich visual cues such as motion and appearance, but also much less explored long-range temporal interactions among objects. We aim to capture such interactions and to construct a powerful intermediatelevel video representation for subsequent recognition. Motivated by this goal, we seek to obtain spatio-temporal oversegmentation of a video into regions that respect object boundaries and, at the same time, associate object pixels over many video frames. The contributions of this paper are two-fold. First, we develop an efficient spatiotemporal video segmentation algorithm, which naturally incorporates long-range motion cues from the past and future frames in the form of clusters of point tracks with coherent motion. Second, we devise a new track clustering cost function that includes occlusion reasoning, in the form of depth ordering constraints, as well as motion similarity along the tracks. We evaluate the proposed approach on a challenging set of video sequences of office scenes from feature length movies. 1.
(Show Context)

Citation Context

...oes not include any unary terms, which model the cost of assigning labels to each random variable independently. The cost function can be extended to incorporate unary potentials, similar to those in =-=[21, 28]-=-. 2.2. Similarity constraints To assign similar tracks to the same clusters, we use potential φ1(xi, xj) which takes the form of a standard Potts model [4, 7], i.e., φ1(xi, xj) = { 0 if xi = xj, 1 oth...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University