Results 1 - 10
of
308
TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object . . .
- IN ECCV
, 2006
"... This paper proposes a new approach to learning a discriminative model of object classes, incorporating appearance, shape and context information efficiently. The learned model is used for automatic visual recognition and semantic segmentation of photographs. Our discriminative model exploits nov ..."
Abstract
-
Cited by 142 (12 self)
- Add to MetaCart
This paper proposes a new approach to learning a discriminative model of object classes, incorporating appearance, shape and context information efficiently. The learned model is used for automatic visual recognition and semantic segmentation of photographs. Our discriminative model exploits novel features, based on textons, which jointly model shape and texture. Unary classification and feature selection is achieved using shared boosting to give an efficient classifier which can be applied to a large number of classes. Accurate image segmentation is achieved by incorporating these classifiers in a conditional random field. Efficient training
A comparative study of energy minimization methods for Markov random fields
- In ECCV
, 2006
"... Abstract. One of the most exciting advances in early vision has been the development of efficient energy minimization algorithms. Many early vision tasks require labeling each pixel with some quantity such as depth or texture. While many such problems can be elegantly expressed in the language of Ma ..."
Abstract
-
Cited by 120 (15 self)
- Add to MetaCart
Abstract. One of the most exciting advances in early vision has been the development of efficient energy minimization algorithms. Many early vision tasks require labeling each pixel with some quantity such as depth or texture. While many such problems can be elegantly expressed in the language of Markov Random Fields (MRF’s), the resulting energy minimization problems were widely viewed as intractable. Recently, algorithms such as graph cuts and loopy belief propagation (LBP) have proven to be very powerful: for example, such methods form the basis for almost all the top-performing stereo methods. Unfortunately, most papers define their own energy function, which is minimized with a specific algorithm of their choice. As a result, the tradeoffs among different energy minimization algorithms are not well understood. In this paper we describe a set of energy minimization benchmarks, which we use to compare the solution quality and running time of several common energy minimization algorithms. We investigate three promising recent methods—graph cuts, LBP, and tree-reweighted message passing—as well as the well-known older iterated conditional modes (ICM) algorithm. Our benchmark problems are drawn from published energy functions used for stereo, image stitching and interactive segmentation. We also provide a general-purpose software interface that allows vision researchers to easily switch between optimization methods with minimal overhead. We expect that the availability of our benchmarks and interface will make it significantly easier for vision researchers to adopt the best method for their specific problems. Benchmarks, code, results and images are available at
Object categorization by learned universal visual dictionary
- In ICCV
, 2005
"... Figure 1: Exemplar snapshots of our interactive object categorization demo application. A user selects (sloppily) a region of interest and our algorithm associates an object class label with it. Despite large differences in pose, size, illumination and visual appearance the correct class label (e.g. ..."
Abstract
-
Cited by 114 (8 self)
- Add to MetaCart
Figure 1: Exemplar snapshots of our interactive object categorization demo application. A user selects (sloppily) a region of interest and our algorithm associates an object class label with it. Despite large differences in pose, size, illumination and visual appearance the correct class label (e.g. cow, building, car...) is automatically associated with each selected object instance. Some of these test images were downloaded from the web and none were part of the training set. A video of the interactive demo may be found at the above web site. This paper presents a new algorithm for the automatic recognition of object classes from images (categorization). Compact and yet discriminative appearance-based object class models are automatically learned from a set of training images. The method is simple and extremely fast, making it suitable for many applications such as semantic image retrieval, web search, and interactive image editing. It classifies a region according to the proportions of different visual words (clusters in feature space). The specific visual words and the typical proportions in each object are learned from a segmented training set. The main contribution of this paper is two fold: i) an optimally compact visual dictionary is learned by pair-wise merging of visual words from an initially large dictionary. The final visual words are described by GMMs. ii) A novel statistical measure of discrimination is proposed which is optimized by each merge operation. High classification accuracy is demonstrated for nine object classes on photographs of real objects viewed under general lighting conditions, poses and viewpoints. The set of test images used for validation comprise: i) photographs acquired by us, ii) images from the web and iii) images from the recently released Pascal dataset. The proposed algorithm performs well on both texture-rich objects (e.g. grass, sky, trees) and structure-rich ones (e.g. cars, bikes, planes). 1.
LOCUS: Learning Object Classes with Unsupervised Segmentation
- in ICCV
, 2005
"... We address the problem of learning object class models and object segmentations from unannotated images. We introduce LOCUS (Learning Object Classes with Unsupervised Segmentation) which uses a generative probabilistic model to combine bottom-up cues of color and edge with top-down cues of shape and ..."
Abstract
-
Cited by 90 (5 self)
- Add to MetaCart
We address the problem of learning object class models and object segmentations from unannotated images. We introduce LOCUS (Learning Object Classes with Unsupervised Segmentation) which uses a generative probabilistic model to combine bottom-up cues of color and edge with top-down cues of shape and pose. A key aspect of this model is that the object appearance is allowed to vary from image to image, allowing for significant within-class variation. By iteratively updating the belief in the object’s position, size, segmentation and pose, LOCUS avoids making hard decisions about any of these quantities and so allows for each to be refined at any stage. We show that LOCUS successfully learns an object class model from unlabeled images, whilst also giving segmentation accuracies that rival existing supervised methods. Finally, we demonstrate simultaneous recognition and segmentation in novel images using the learned models for a number of object classes, as well as unsupervised object discovery and tracking in video. 1.
Graph Cuts and Efficient N-D Image Segmentation
, 2006
"... Combinatorial graph cut algorithms have been successfully applied to a wide range of problems in vision and graphics. This paper focusses on possibly the simplest application of graph-cuts: segmentation of objects in image data. Despite its simplicity, this application epitomizes the best features ..."
Abstract
-
Cited by 74 (3 self)
- Add to MetaCart
Combinatorial graph cut algorithms have been successfully applied to a wide range of problems in vision and graphics. This paper focusses on possibly the simplest application of graph-cuts: segmentation of objects in image data. Despite its simplicity, this application epitomizes the best features of combinatorial graph cuts methods in vision: global optima, practical efficiency, numerical robustness, ability to fuse a wide range of visual cues and constraints, unrestricted topological properties of segments, and applicability to N-D problems. Graph cuts based approaches to object extraction have also been shown to have interesting connections with earlier segmentation methods such as snakes, geodesic active contours, and level-sets. The segmentation energies optimized by graph cuts combine boundary regularization with region-based properties in the same fashion as Mumford-Shah style functionals. We present motivation and detailed technical description of the basic combinatorial optimization framework for image segmentation via s/t graph cuts. After the general concept of using binary graph cut algorithms for object segmentation was first proposed and tested in Boykov and Jolly (2001), this idea was widely studied in computer vision and graphics communities. We provide links to a large number of known extensions based on iterative parameter re-estimation and learning, multi-scale or hierarchical approaches, narrow bands, and other techniques for demanding photo, video, and medical applications.
Contour-based learning for object detection
- In Proceedings, International Conference on Computer Vision
, 2005
"... We present a novel categorical object detection scheme that uses only local contour-based features. A two-stage, partially supervised learning architecture is proposed: a rudimentary detector is learned from a very small set of segmented images and applied to a larger training set of unsegmented ima ..."
Abstract
-
Cited by 70 (1 self)
- Add to MetaCart
We present a novel categorical object detection scheme that uses only local contour-based features. A two-stage, partially supervised learning architecture is proposed: a rudimentary detector is learned from a very small set of segmented images and applied to a larger training set of unsegmented images; the second stage bootstraps these detections to learn an improved classifier while explicitly training against clutter. The detectors are learned with a boosting algorithm which creates a location-sensitive classifier using a discriminative set of features from a randomly chosen dictionary of contour fragments. We present results that are very competitive with other state-of-the-art object detection schemes and show robustness to object articulations, clutter, and occlusion. Our major contributions are the application of boosted local contour-based features for object detection in a partially supervised learning framework, and an efficient new boosting procedure for simultaneously selecting features and estimating per-feature parameters. 1.
An iterative optimization approach for unified image segmentation and matting
- In ICCV
, 2005
"... Separating a foreground object from the background in a static image involves determining both full and partial pixel coverages, also known as extracting a matte. Previous approaches require the input image to be pre-segmented into three regions: foreground, background and unknown, which is called a ..."
Abstract
-
Cited by 63 (2 self)
- Add to MetaCart
Separating a foreground object from the background in a static image involves determining both full and partial pixel coverages, also known as extracting a matte. Previous approaches require the input image to be pre-segmented into three regions: foreground, background and unknown, which is called a trimap. Partial opacity values are then computed only for pixels inside the unknown region. This presegmentation based approach fails for images with large portions of semi-transparent foreground where the trimap is difficult to create even manually. In this paper we combine the segmentation and matting problem together and propose a unified optimization approach based on Belief Propagation. We iteratively estimate the opacity value for every pixel in the image, based on a small sample of foreground and background pixels marked by the user. Experimental results show that compared with previous approaches, our method is more efficient to extract high quality mattes for foregrounds with significant semi-transparent regions. 1.
Video object cut and paste
- ACM Transactions on Graphics
, 2005
"... Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of ..."
Abstract
-
Cited by 51 (1 self)
- Add to MetaCart
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions
Robust Higher Order Potentials for Enforcing Label Consistency
, 2009
"... This paper proposes a novel framework for labelling problems which is able to combine multiple segmentations in a principled manner. Our method is based on higher order conditional random fields and uses potentials defined on sets of pixels (image segments) generated using unsupervised segmentation ..."
Abstract
-
Cited by 49 (9 self)
- Add to MetaCart
This paper proposes a novel framework for labelling problems which is able to combine multiple segmentations in a principled manner. Our method is based on higher order conditional random fields and uses potentials defined on sets of pixels (image segments) generated using unsupervised segmentation algorithms. These potentials enforce label consistency in image regions and can be seen as a generalization of the commonly used pairwise contrast sensitive smoothness potentials. The higher order potential functions used in our framework take the form of the Robust P n model and are more general than the P n Potts model recently proposed by Kohli et al. We prove that the optimal swap and expansion moves for energy functions composed of these potentials can be computed by solving a stmincut problem. This enables the use of powerful graph cut based move making algorithms for performing inference in the framework. We test our method on the problem of multi-class object segmentation by augmenting the conventional CRF used for object segmentation with higher order potentials defined on image regions. Experiments on challenging data sets show that integration of higher order potentials quantitatively and qualitatively improves results leading to much better definition of object boundaries. We
Bilayer segmentation of live video
- In: IEEE Conference on Computer Vision and Pattern Recognition
, 2006
"... a input sequence b automatic layer separation and background substitution in three different frames Figure 1: An example of automatic foreground/background segmentation in monocular image sequences. Despite the challenging foreground motion the person is accurately extracted from the sequence and th ..."
Abstract
-
Cited by 48 (3 self)
- Add to MetaCart
a input sequence b automatic layer separation and background substitution in three different frames Figure 1: An example of automatic foreground/background segmentation in monocular image sequences. Despite the challenging foreground motion the person is accurately extracted from the sequence and then composited free of aliasing upon a different background; a useful tool in video-conferencing applications. The sequences and ground truth data used throughout this paper are available from [1]. This paper presents an algorithm capable of real-time separation of foreground from background in monocular video sequences. Automatic segmentation of layers from colour/contrast or from motion alone is known to be error-prone. Here motion, colour and contrast cues are probabilistically fused together with spatial and temporal priors to infer layers accurately and efficiently. Central to our algorithm is the fact that pixel velocities are not needed, thus removing the need for optical flow estimation, with its tendency to error and computational expense. Instead, an efficient motion vs nonmotion classifier is trained to operate directly and jointly on intensity-change and contrast. Its output is then fused with colour information. The prior on segmentation is represented by a second order, temporal, Hidden Markov Model, together with a spatial MRF favouring coherence except where contrast is high. Finally, accurate layer segmentation and explicit occlusion detection are efficiently achieved by binary graph cut. The segmentation accuracy of the proposed algorithm is quantitatively evaluated with respect to existing groundtruth data and found to be comparable to the accuracy of a state of the art stereo segmentation algorithm. Foreground/background segmentation is demonstrated in the application of live background substitution and shown to generate convincingly good quality composite video. 1 1.

