• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

A.: Textonboost for Image Understanding: MultiClass Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context (2009)

by J Shotton, J Winn, C Rother, Criminisi
Venue:IJCV
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 44
Next 10 →

Semantic Texton Forests for Image Categorization and Segmentation

by Jamie Shotton, Matthew Johnson, Roberto Cipolla
"... We propose semantic texton forests, efficient and powerful new low-level features. These are ensembles of decision trees that act directly on image pixels, and therefore do not need the expensive computation of filter-bank responses or local descriptors. They are extremely fast to both train and tes ..."
Abstract - Cited by 72 (6 self) - Add to MetaCart
We propose semantic texton forests, efficient and powerful new low-level features. These are ensembles of decision trees that act directly on image pixels, and therefore do not need the expensive computation of filter-bank responses or local descriptors. They are extremely fast to both train and test, especially compared with k-means clustering and nearest-neighbor assignment of feature descriptors. The nodes in the trees provide (i) an implicit hierarchical clustering into semantic textons, and (ii) an explicit local classification estimate. Our second contribution, the bag of semantic textons, combines a histogram of semantic textons over an image region with a region prior category distribution. The bag of semantic textons is computed over the whole image for categorization, and over local rectangular regions for segmentation. Including both histogram and region prior allows our segmentation algorithm to exploit both textural and semantic context. Our third contribution is an image-level prior for segmentation that emphasizes those categories that the automatic categorization believes to be present. We evaluate on two datasets including the very challenging VOC 2007 segmentation dataset. Our results significantly advance the state-of-the-art in segmentation accuracy, and furthermore, our use of efficient decision forests gives at least a five-fold increase in execution speed.

Learning 3D mesh segmentation and labeling

by Evangelos Kalogerakis, Aaron Hertzmann, Karan Singh - ACM Trans. on Graphics , 2010
"... head torso upper arm lower arm hand upper leg lower leg foot ear head torso arm leg tail body fin handle cup top base arm lens bridge antenna head thorax leg abdomen cup handle face hair neck fin stabilizer body wing top leg thumb index middle ring pinky palm big roller medium roller axle handle joi ..."
Abstract - Cited by 22 (3 self) - Add to MetaCart
head torso upper arm lower arm hand upper leg lower leg foot ear head torso arm leg tail body fin handle cup top base arm lens bridge antenna head thorax leg abdomen cup handle face hair neck fin stabilizer body wing top leg thumb index middle ring pinky palm big roller medium roller axle handle joint jaws head neck torso leg tail ear head torso back upper arm lower arm hand upper leg lower leg foot tail head wing body leg tail big cube small cube back middle seat leg head tentacle Figure 1: Labeling and segmentation results from applying our algorithm to one mesh each from every category in the Princeton Segmentation Benchmark [Chen et al. 2009]. For each result, the algorithm was trained on the other meshes in the same class, e.g., the human was labeled after training on the other meshes in the human class. This paper presents a data-driven approach to simultaneous segmentation and labeling of parts in 3D meshes. An objective function is formulated as a Conditional Random Field model, with terms assessing the consistency of faces with labels, and terms between labels of neighboring faces. The objective function is learned from a collection of labeled training meshes. The algorithm uses hundreds of geometric and contextual label features and learns different types of segmentations for different tasks, without requiring manual parameter tuning. Our algorithm achieves a significant improvement in results over the state-of-the-art when evaluated on the Princeton Segmentation Benchmark, often producing segmentations and labelings comparable to those produced by humans. 1

Image-based Street-side City Modeling

by Jianxiong Xiao, Tian Fang, Peng Zhao, Maxime Lhuillier, Long Quan, Lasmea Université, Blaise Pascal
"... Figure 1: Two close-ups of the parts 1 and 2 of a modeled city area shown in the first two rows. All the models are automatically generated from input images, exemplified by the bottom row. The close-up of the part 3 is shown in Figure 15. We propose an automatic approach to generate street-side 3D ..."
Abstract - Cited by 12 (2 self) - Add to MetaCart
Figure 1: Two close-ups of the parts 1 and 2 of a modeled city area shown in the first two rows. All the models are automatically generated from input images, exemplified by the bottom row. The close-up of the part 3 is shown in Figure 15. We propose an automatic approach to generate street-side 3D photo-realistic models from images captured along the streets at ground level. We first develop a multi-view semantic segmentation method that recognizes and segments each image at pixel level into semantically meaningful areas, each labeled with a specific object class, such as building, sky, ground, vegetation and car. A partition scheme is then introduced to separate buildings into independent blocks using the major line structures of the scene. Finally, for each block, we propose an inverse patch-based orthographic composition and structure analysis method for façade modeling that efficiently regularizes the noisy and missing reconstructed 3D data. Our system has the distinct advantage of producing visually compelling results by imposing strong priors of building regularity. We demonstrate the fully automatic system on a typical city example to validate our methodology. Keywords: Image-based modeling, street view, street-side, building modeling, façade modeling, city modeling, 3D reconstruction.

Stacked Hierarchical Labeling

by Daniel Munoz, J. Andrew Bagnell, Martial Hebert
"... Abstract. In this work we propose a hierarchical approach for labeling semantic objects and regions in scenes. Our approach is reminiscent of early vision literature in that we use a decomposition of the image in order to encode relational and spatial information. In contrast to much existing work o ..."
Abstract - Cited by 11 (6 self) - Add to MetaCart
Abstract. In this work we propose a hierarchical approach for labeling semantic objects and regions in scenes. Our approach is reminiscent of early vision literature in that we use a decomposition of the image in order to encode relational and spatial information. In contrast to much existing work on structured prediction for scene understanding, we bypass a global probabilistic model and instead directly train a hierarchical inference procedure inspired by the message passing mechanics of some approximate inference procedures in graphical models. This approach mitigates both the theoretical and empirical difficulties of learning probabilistic models when exact inference is intractable. In particular, we draw from recent work in machine learning and break the complex inference process into a hierarchical series of simple machine learning subproblems. Each subproblem in the hierarchy is designed to capture the image and contextual statistics in the scene. This hierarchy spans coarse-to-fine regions and explicitly models the mixtures of semantic labels that may be present due to imperfect segmentation. To avoid cascading of errors and overfitting, we train the learning problems in sequence to ensure robustness to likely errors earlier in the inference sequence and leverage the stacking approach developed by Cohen et al. 1

Discriminative Learning with Latent Variables for Cluttered Indoor Scene Understanding. ECCV

by Huayan Wang, Stephen Gould, Daphne Koller , 2010
"... Abstract. We address the problem of understanding an indoor scene from a single image in terms of recovering the layouts of the faces (floor, ceiling, walls) and furniture. A major challenge of this task arises from the fact that most indoor scenes are cluttered by furniture and decorations, whose a ..."
Abstract - Cited by 10 (0 self) - Add to MetaCart
Abstract. We address the problem of understanding an indoor scene from a single image in terms of recovering the layouts of the faces (floor, ceiling, walls) and furniture. A major challenge of this task arises from the fact that most indoor scenes are cluttered by furniture and decorations, whose appearances vary drastically across scenes, and can hardly be modeled (or even hand-labeled) consistently. In this paper we tackle this problem by introducing latent variables to account for clutters, so that the observed image is jointly explained by the face and clutter layouts. Model parameters are learned in the maximum margin formulation, which is constrained by extra prior energy terms that define the role of the latent variables. Our approach enables taking into account and inferring indoor clutter layouts without hand-labeling of the clutters in the training set. Yet it outperforms the state-of-the-art method of Hedau et al. [4] that requires clutter labels. 1

Harmony Potentials for Joint Classification and Segmentation

by Josep M. Gonfaus, Xavier Boix, Joost Van De Weijer, Andrew D. Bagdanov Joan Serrat, Jordi Gonzàlez - In Conference on Computer Vision and Pattern Recognition , 2010
"... Hierarchical conditional random fields have been successfully applied to object segmentation. One reason is their ability to incorporate contextual information at different scales. However, these models do not allow multiple labels to be assigned to a single node. At higher scales in the image, this ..."
Abstract - Cited by 9 (1 self) - Add to MetaCart
Hierarchical conditional random fields have been successfully applied to object segmentation. One reason is their ability to incorporate contextual information at different scales. However, these models do not allow multiple labels to be assigned to a single node. At higher scales in the image, this yields an oversimplified model, since multiple classes can be reasonable expected to appear within one region. This simplified model especially limits the impact that observations at larger scales may have on the CRF model. Neglecting the information at larger scales is undesirable since class-label estimates based on these scales are more reliable than at smaller, noisier scales. To address this problem, we propose a new potential, called harmony potential, which can encode any possible combination of class labels. We propose an effective sampling strategy that renders tractable the underlying optimization problem. Results show that our approach obtains state-of-the-art results on two challenging datasets: Pascal VOC 2009 and MSRC-21. 1.

Efficiently Combining Contour and Texture Cues for Object Recognition

by Jamie Shotton, Andrew Blake, Roberto Cipolla
"... This paper proposes an efficient fusion of contour and texture cues for image categorization and object detection. Our work confirms and strengthens recent results that combining complementary feature types improves performance. We obtain a similar improvement in accuracy and additionally an improve ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
This paper proposes an efficient fusion of contour and texture cues for image categorization and object detection. Our work confirms and strengthens recent results that combining complementary feature types improves performance. We obtain a similar improvement in accuracy and additionally an improvement in efficiency. We use a boosting algorithm to learn models that use contour and texture features. Our main contributions are (i) the use of dense generic texture features to complement contour fragments, and (ii) a simple feature selection mechanism that includes the computational costs of features in order to learn a run-time efficient model. Our evaluation on 17 challenging and varied object classes confirms that the synergy of the two feature types performs significantly better than either alone, and that computational efficiency is substantially improved using our feature selection mechanism. An investigation of the boosted features shows a fascinating emergent property: the absence of certain textures often contributes towards object detection. Comparison with recent work shows that performance is state of the art. 1

Efficient Minimization of Decomposable Submodular Functions

by Peter Stobbe, Andreas Krause
"... Many combinatorial problems arising in machine learning can be reduced to the problem of minimizing a submodular function. Submodular functions are a natural discrete analog of convex functions, and can be minimized in strongly polynomial time. Unfortunately, state-of-the-art algorithms for general ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
Many combinatorial problems arising in machine learning can be reduced to the problem of minimizing a submodular function. Submodular functions are a natural discrete analog of convex functions, and can be minimized in strongly polynomial time. Unfortunately, state-of-the-art algorithms for general submodular minimization are intractable for larger problems. In this paper, we introduce a novel subclass of submodular minimization problems that we call decomposable. Decomposable submodular functions are those that can be represented as sums of concave functions applied to modular functions. We develop an algorithm, SLG, that can efficiently minimize decomposable submodular functions with tens of thousands of variables. Our algorithm exploits recent results in smoothed convex minimization. We apply SLG to synthetic benchmarks and a joint classification-and-segmentation task, and show that it outperforms the state-of-the-art general purpose submodular minimization algorithms by several orders of magnitude. 1

Context Based Object Categorization: A Critical Survey

by Carolina Galleguillos, Serge Belongie
"... Abstract. The goal of object categorization is to locate and identify instances of an object category within an image. Recognizing an object in an image is difficult when images present occlusion, poor quality, noise or background clutter, and this task becomes even more challenging when many object ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
Abstract. The goal of object categorization is to locate and identify instances of an object category within an image. Recognizing an object in an image is difficult when images present occlusion, poor quality, noise or background clutter, and this task becomes even more challenging when many objects are present in the same scene. Several models for object categorization use appearance and context information from objects to improve recognition accuracy. Appearance information, based on visual cues, can successfully identify object classes up to a certain extent. Context information, based on the interaction among objects in the scene or on global scene statistics, can help successfully disambiguate appearance inputs in recognition tasks. In this work we review different approaches of using contextual information in the field of object categorization and discuss scalability, optimizations and possible future approaches. 1

Geodesic Image and Video Editing

by ANTONIO CRIMINISI , TOBY SHARP, CARSTEN ROTHER , PATRICK PÉREZ , 2010
"... This paper presents a new, unified technique to perform general edge-sensitive editing operations on n-dimensional images and videos efficiently. The first contribution of the paper is the introduction of a generalized geodesic distance transform (GGDT), based on soft masks. This provides a unified ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
This paper presents a new, unified technique to perform general edge-sensitive editing operations on n-dimensional images and videos efficiently. The first contribution of the paper is the introduction of a generalized geodesic distance transform (GGDT), based on soft masks. This provides a unified framework to address several, edgeaware editing operations. Diverse editing tasks such as de-noising and non-photorealistic rendering, are all dealt with fundamentally the same, fast algorithm. Second, a new, geodesic, symmetric filter (GSF) is presented which imposes contrast-sensitive spatial smoothness into segmentation and segmentation-based editing tasks (cutout, object highlightening, colorization, panorama stiching). The effect of the filter is controlled by two intuitive, geometric parameters. In contrast to existing techniques, the GSF filter is applied to real-valued pixel likelihoods (soft masks), thanks to GGDTs and it can be used for both interactive and automatic editing tasks. Complex object topologies are dealt with effortlessly. Finally, the parallelism of GGDTs enables us to exploit modern multi-core CPU architectures as well as powerful new GPUs, thus providing great flexibility of implementation and deployment. Our technique operates on both images and videos, and generalizes naturally to n-dimensional data. The proposed algorithm is validated via quantitative and qualitative comparisons with existing, state of the art approaches. Numerous results on a variety of image and video editing tasks further demonstrate the effectiveness of our method.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University