Results 1 - 10
of
46
Video google: A text retrieval approach to object matching in videos
- In Proc. ICCV
, 2003
"... We describe an approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, ill ..."
Abstract
-
Cited by 550 (24 self)
- Add to MetaCart
We describe an approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion. The temporal continuity of the video within a shot is used to track the regions in order to reject unstable regions and reduce the effects of noise in the descriptors. The analogy with text retrieval is in the implementation where matches on descriptors are pre-computed (using vector quantization), and inverted file systems and document rankings are used. The result is that retrieval is immediate, returning a ranked list of key frames/shots in the manner of Google. The method is illustrated for matching on two full length feature films. 1.
Scalable Recognition with a Vocabulary Tree
- IN CVPR
, 2006
"... A recognition scheme that scales efficiently to a large number of objects is presented. The efficiency and quality is exhibited in a live demonstration that recognizes CD-covers from a database of 40000 images of popular music CD's. The scheme ..."
Abstract
-
Cited by 374 (0 self)
- Add to MetaCart
A recognition scheme that scales efficiently to a large number of objects is presented. The efficiency and quality is exhibited in a live demonstration that recognizes CD-covers from a database of 40000 images of popular music CD's. The scheme
Feature detection with automatic scale selection
- International Journal of Computer Vision
, 1998
"... The fact that objects in the world appear in different ways depending on the scale of observation has important implications if one aims at describing them. It shows that the notion of scale is of utmost importance when processing unknown measurement data by automatic methods. In their seminal works ..."
Abstract
-
Cited by 349 (25 self)
- Add to MetaCart
The fact that objects in the world appear in different ways depending on the scale of observation has important implications if one aims at describing them. It shows that the notion of scale is of utmost importance when processing unknown measurement data by automatic methods. In their seminal works, Witkin (1983) and Koenderink (1984) proposed to approach this problem by representing image structures at different scales in a so-called scale-space representation. Traditional scale-space theory building on this work, however, does not address the problem of how to select local appropriate scales for further analysis. This article proposes a systematic methodology for dealing with this problem. A framework is proposed for generating hypotheses about interesting scale levels in image data, based on a general principle stating that local extrema over scales of different combinations of γ-normalized derivatives are likely candidates to correspond to interesting structures. Specifically, it is shown how this idea can be used as a major mechanism in algorithms for automatic scale selection, which
Viewpoint Invariant Texture Matching and Wide Baseline Stereo
- In Proc. ICCV
, 2001
"... We describe and demonstrate a texture region descriptor which is invariant to affine geometric and photometric transformations, and insensitive to the shape of the texture region. It is applicable to texture patches which are locally planar and have stationary statistics. The novelty of the descript ..."
Abstract
-
Cited by 77 (7 self)
- Add to MetaCart
We describe and demonstrate a texture region descriptor which is invariant to affine geometric and photometric transformations, and insensitive to the shape of the texture region. It is applicable to texture patches which are locally planar and have stationary statistics. The novelty of the descriptor is that it is based on statistics aggregated over the region, resulting in richer and more stable descriptors than those computed at a point. Two texture matching applications of this descriptor are demonstrated: (1) it is used to automatically identify regions of the same type of texture, but with varying surface pose, within a single image
A statistical approach to texture classification from single images
- International Journal of Computer Vision
, 2005
"... Abstract. We investigate texture classification from single images obtained under unknown viewpoint and illumination. A statistical approach is developed where textures are modelled by the joint probability distribution of filter responses. This distribution is represented by the frequency histogram ..."
Abstract
-
Cited by 72 (6 self)
- Add to MetaCart
Abstract. We investigate texture classification from single images obtained under unknown viewpoint and illumination. A statistical approach is developed where textures are modelled by the joint probability distribution of filter responses. This distribution is represented by the frequency histogram of filter response cluster centres (textons). Recognition proceeds from single, uncalibrated images and the novelty here is that rotationally invariant filters are used and the filter response space is low dimensional. Classification performance is compared with the filter banks and methods of
A Sparse Texture Representation Using Affine-Invariant Regions
- In Proc. CVPR
, 2003
"... This paper introduces a texture representation suitable for recognizing images of textured surfaces under a wide range of transformations, including viewpoint changes and nonrigid deformations. At the feature extraction stage, a sparse set of affine-invariant local patches is extracted from the imag ..."
Abstract
-
Cited by 57 (9 self)
- Add to MetaCart
This paper introduces a texture representation suitable for recognizing images of textured surfaces under a wide range of transformations, including viewpoint changes and nonrigid deformations. At the feature extraction stage, a sparse set of affine-invariant local patches is extracted from the image. This spatial selection process permits the computation of characteristic scale and neighborhood shape for every texture element. The proposed texture representation is evaluated in retrieval and classification tasks using the entire Brodatz database and a collection of photographs of textured surfaces taken from different viewpoints. 1.
Coherence-Enhancing Diffusion Filtering
, 1999
"... The completion of interrupted lines or the enhancement of flow-like structures is a challenging task in computer vision, human vision, and image processing. We address this problem by presenting a multiscale method in which a nonlinear diffusion filter is steered by the so-called interest operato ..."
Abstract
-
Cited by 52 (2 self)
- Add to MetaCart
The completion of interrupted lines or the enhancement of flow-like structures is a challenging task in computer vision, human vision, and image processing. We address this problem by presenting a multiscale method in which a nonlinear diffusion filter is steered by the so-called interest operator (second-moment matrix, structure tensor). An m-dimensional formulation of this method is analysed with respect to its well-posedness and scale-space properties. An efficient scheme is presented which uses a stabilization by a semi-implicit additive operator splitting (AOS), and the scale-space behaviour of this method is illustrated by applying it to both 2-D and 3-D images.
Shape-adapted smoothing in estimation of 3-D shape cues from affine distortions of local 2-D brightness structure
, 2001
"... This article describes a method for reducing the shape distortions due to scale-space smoothing that arise in the computation of 3-D shape cues using operators (derivatives) de ned from scale-space representation. More precisely, we are concerned with a general class of methods for deriving 3-D shap ..."
Abstract
-
Cited by 32 (3 self)
- Add to MetaCart
This article describes a method for reducing the shape distortions due to scale-space smoothing that arise in the computation of 3-D shape cues using operators (derivatives) de ned from scale-space representation. More precisely, we are concerned with a general class of methods for deriving 3-D shape cues from 2-D image data based on the estimation of locally linearized deformations of brightness patterns. This class
Fingerprint Enhancement by Shape Adaptation of Scale-Space Operators with Automatic Scale Selection
"... This work presents two mechanisms for processing fingerprint images; shape-adapted smoothing based on second moment descriptors and automatic scale selection based on normalized derivatives. The shape adaptation procedure adapts the smoothing operation to the local ridge structures, which allows int ..."
Abstract
-
Cited by 31 (9 self)
- Add to MetaCart
This work presents two mechanisms for processing fingerprint images; shape-adapted smoothing based on second moment descriptors and automatic scale selection based on normalized derivatives. The shape adaptation procedure adapts the smoothing operation to the local ridge structures, which allows interrupted ridges to be joined without destroying essential singularities such as branching points and enforces continuity of their directional fields. The scale selection procedure estimates local ridge width and adapts the amount of smoothing to the local amount of noise. In addition, a ridgeness measure is defined, which reflects how well the local image structure agrees with a qualitative ridge model, and is used for spreading the results of shape adaptation into noisy areas. The combined approach makes it possible to resolve fine scale structures in clear areas while reducing the risk of enhancing noise in blurred or fragmented areas. The result is a reliable and adaptively detailed estimate of the ridge orientation field and ridge width, as well as a smoothed grey-level version of the input image. We propose that these general techniques should be of interest to developers of automatic fingerprint identification
Video Google: Efficient visual search of videos
- In Toward Category-Level Object Recognition, volume 4170 of LNCS
, 2006
"... Abstract. We describe an approach to object retrieval which searches for and localizes all the occurrences of an object in a video, given a query image of the object. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite cha ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
Abstract. We describe an approach to object retrieval which searches for and localizes all the occurrences of an object in a video, given a query image of the object. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion. The temporal continuity of the video within a shot is used to track the regions in order to reject those that are unstable. Efficient retrieval is achieved by employing methods from statistical text retrieval, including inverted file systems, and text and document frequency weightings. This requires a visual analogy of a word which is provided here by vector quantizing the region descriptors. The final ranking also depends on the spatial layout of the regions. The result is that retrieval is immediate, returning a ranked list of shots in the manner of Google. We report results for object retrieval on the full length feature films

