Results 1 - 10
of
57
Object class recognition by unsupervised scale-invariant learning
- In CVPR
, 2003
"... We present a method to learn and recognize object class models from unlabeled and unsegmented cluttered scenes in a scale invariant manner. Objects are modeled as flexible constellations of parts. A probabilistic representation is used for all aspects of the object: shape, appearance, occlusion and ..."
Abstract
-
Cited by 646 (35 self)
- Add to MetaCart
We present a method to learn and recognize object class models from unlabeled and unsegmented cluttered scenes in a scale invariant manner. Objects are modeled as flexible constellations of parts. A probabilistic representation is used for all aspects of the object: shape, appearance, occlusion and relative scale. An entropy-based feature detector is used to select regions and their scale within the image. In learning the parameters of the scale-invariant object model are estimated. This is done using expectation-maximization in a maximum-likelihood setting. In recognition, this model is used in a Bayesian manner to classify images. The flexible nature of the model is demonstrated by excellent results over a range of datasets including geometrically constrained classes (e.g. faces, cars) and flexible objects (such as animals). 1.
A bayesian hierarchical model for learning natural scene categories
- In CVPR
, 2005
"... We propose a novel approach to learn and recognize natural scene categories. Unlike previous work [9, 17], it does not require experts to annotate the training set. We represent the image of a scene by a collection of local regions, denoted as codewords obtained by unsupervised learning. Each region ..."
Abstract
-
Cited by 321 (11 self)
- Add to MetaCart
We propose a novel approach to learn and recognize natural scene categories. Unlike previous work [9, 17], it does not require experts to annotate the training set. We represent the image of a scene by a collection of local regions, denoted as codewords obtained by unsupervised learning. Each region is represented as part of a “theme”. In previous work, such themes were learnt from hand-annotations of experts, while our method learns the theme distributions as well as the codewords distribution over the themes without supervision. We report satisfactory categorization performances on a large set of 13 categories of complex scenes. 1.
Learning object categories from google’s image search
- In ICCV
, 2005
"... Current approaches to object category recognition require datasets of training images to be manually prepared, with varying degrees of supervision. We present an approach that can learn an object category from just its name, by utilizing the raw output of image search engines available on the Intern ..."
Abstract
-
Cited by 154 (11 self)
- Add to MetaCart
Current approaches to object category recognition require datasets of training images to be manually prepared, with varying degrees of supervision. We present an approach that can learn an object category from just its name, by utilizing the raw output of image search engines available on the Internet. We develop a new model, TSI-pLSA, which extends pLSA (as applied to visual words) to include spatial information in a translation and scale invariant manner. Our approach can handle the high intra-class variability and large proportion of unrelated images returned by search engines. We evaluate the models on standard test sets, showing performance competitive with existing methods trained on hand prepared datasets. 1.
A sparse object category model for efficient learning and exhaustive recognition
- In CVPR
, 2005
"... We present a “parts and structure ” model for object category recognition that can be learnt efficiently and in a weakly-supervised manner: the model is learnt from example images containing category instances, without requiring segmentation from background clutter. The model is a sparse representat ..."
Abstract
-
Cited by 96 (7 self)
- Add to MetaCart
We present a “parts and structure ” model for object category recognition that can be learnt efficiently and in a weakly-supervised manner: the model is learnt from example images containing category instances, without requiring segmentation from background clutter. The model is a sparse representation of the object, and consists of a star topology configuration of parts modeling the output of a variety of feature detectors. The optimal choice of feature types (whose repertoire includes interest points, curves and regions) is made automatically. In recognition, the model may be applied efficiently in a complete manner, bypassing the need for feature detectors, to give the globally optimal match within a query image. The approach is demonstrated on a wide variety of categories, and delivers both successful classification and localization of the object within the image. 1
An Affine Invariant Salient Region Detector
, 2004
"... In this paper we describe a novel technique for detecting salient regions in an image. The detector is a generalization to a#ne invariance of the method introduced by Kadir and Brady [10]. The detector deems a region salient if it exhibits unpredictability in both its attributes and its spatial ..."
Abstract
-
Cited by 90 (4 self)
- Add to MetaCart
In this paper we describe a novel technique for detecting salient regions in an image. The detector is a generalization to a#ne invariance of the method introduced by Kadir and Brady [10]. The detector deems a region salient if it exhibits unpredictability in both its attributes and its spatial scale.
A Visual Category Filter for Google Images
- In Proc. ECCV
, 2004
"... Abstract. We extend the constellation model to include heterogeneous parts which may represent either the appearance or the geometry of a region of the object. The parts and their spatial configuration are learnt simultaneously and automatically, without supervision, from cluttered images. We descri ..."
Abstract
-
Cited by 63 (7 self)
- Add to MetaCart
Abstract. We extend the constellation model to include heterogeneous parts which may represent either the appearance or the geometry of a region of the object. The parts and their spatial configuration are learnt simultaneously and automatically, without supervision, from cluttered images. We describe how this model can be employed for ranking the output of an image search engine when searching for object categories. It is shown that visual consistencies in the output images can be identified, and then used to rank the images according to their closeness to the visual object category. Although the proportion of good images may be small, the algorithm is designed to be robust and is capable of learning in either a totally unsupervised manner, or with a very limited amount of supervision. We demonstrate the method on image sets returned by Google’s image search for a number of object categories including bottles, camels, cars, horses, tigers and zebras.
Scale-Invariant Object Categorization using a Scale-Adaptive Mean-Shift Search
, 2004
"... The goal of our work is object categorization in real-world scenes. That is, given a novel image we want to recognize and localize unseen-before objects based on their similarity to a learned object category. For use in a real-world system, it is important that this includes the ability to recognize ..."
Abstract
-
Cited by 61 (6 self)
- Add to MetaCart
The goal of our work is object categorization in real-world scenes. That is, given a novel image we want to recognize and localize unseen-before objects based on their similarity to a learned object category. For use in a real-world system, it is important that this includes the ability to recognize objects at multiple scales. In this paper, we present an approach to multi-scale object categorization using scale-invariant interest points and a scale-adaptive Mean-Shift search. The approach builds on the method from (Leibe & Schiele, BMVC'03; Leibe et al., SLCV'03), which has been demonstrated to achieve excellent results for the single-scale case, and extends it to multiple scales. We present an experimental comparison of the influence of different interest point operators and quantitatively show the method's robustness to large scale changes.
A sparse texture representation using local affine regions
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2005
"... This article introduces a texture representation suitable for recognizing images of textured surfaces under a wide range of transformations, including viewpoint changes and non-rigid deformations. At the feature extraction stage, a sparse set of affine Harris and Laplacian regions is found in the im ..."
Abstract
-
Cited by 60 (11 self)
- Add to MetaCart
This article introduces a texture representation suitable for recognizing images of textured surfaces under a wide range of transformations, including viewpoint changes and non-rigid deformations. At the feature extraction stage, a sparse set of affine Harris and Laplacian regions is found in the image. Each of these regions can be thought of as a texture element having a characteristic elliptic shape and a distinctive appearance pattern. This pattern is captured in an affine-invariant fashion via a process of shape normalization followed by the computation of two novel descriptors, the spin image and the RIFT descriptor. When affine invariance is not required, the original elliptical shape serves as an additional discriminative feature for texture recognition. The proposed approach is evaluated in retrieval and classi-fication tasks using the entire Brodatz database and a publicly available collection of 1000 photographs of textured surfaces taken from different viewpoints.
Speeded-Up Robust Features (SURF)
, 2008
"... This article presents a novel scale- and rotation-invariant detector and descriptor, coined SURF (Speeded-Up Robust Features). SURF approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faste ..."
Abstract
-
Cited by 44 (0 self)
- Add to MetaCart
This article presents a novel scale- and rotation-invariant detector and descriptor, coined SURF (Speeded-Up Robust Features). SURF approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster. This is achieved by relying on integral images for image convolutions; by building on the strengths of the leading existing detectors and descriptors (specifically, using a Hessian matrix-based measure for the detector, and a distribution-based descriptor); and by simplifying these methods to the essential. This leads to a combination of novel detection, description, and matching steps. The paper encompasses a detailed description of the detector and descriptor and then explores the effect of the most important parameters. We conclude the article with SURF’s application to two challenging, yet converse goals: camera calibration as a special case of image registration, and object recognition. Our experiments underline SURF’s usefulness in a broad range of topics in computer vision.
L.: Spatially coherent latent topic model for concurrent object segmentation and classification
- In: Proceedings of IEEE International Conference on Computer Vision
, 2007
"... A. Input Image B. LDA initialized topics C. LDA learned topics D. LDA learned object We present a novel generative model for simultaneously recognizing and segmenting object and scene classes. Our model is inspired by the traditional bag of words representation of texts and images as well as a numbe ..."
Abstract
-
Cited by 42 (3 self)
- Add to MetaCart
A. Input Image B. LDA initialized topics C. LDA learned topics D. LDA learned object We present a novel generative model for simultaneously recognizing and segmenting object and scene classes. Our model is inspired by the traditional bag of words representation of texts and images as well as a number of related generative models, including probabilistic Latent Sematic Analysis (pLSA) and Latent Dirichlet Allocation (LDA). A major drawback of the pLSA and LDA models is the assumption that each patch in the image is independently generated given its corresponding latent topic. While such representation provide an efficient computational method, it lacks the power to describe the visually coherent images and scenes. Instead, we propose a spatially coherent latent topic model (Spatial-LTM). Spatial-LTM represents an image containing objects in a hierarchical way by oversegmented image regions of homogeneous appearances and the salient image patches within the regions. Only one single latent topic is assigned to the image patches within each region, enforcing the spatial coherency of the model. This idea gives rise to the following merits of Spatial-LTM: (1) Spatial-LTM provides a unified representation for spatially coherent bag of words topic models; (2) Spatial-LTM can simultaneously segment and classify objects, even in the case of occlusion and multiple instances; and (3) Spatial-LTM can be trained either unsupervised or supervised, as well as when partial object labels are provided. We verify the success of our model in a number of segmentation and classification experiments. E. Coherent regions for

