Results 1 - 10
of
215
Video google: A text retrieval approach to object matching in videos
- In ICCV
, 2003
"... We describe an approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, ill ..."
Abstract
-
Cited by 1621 (42 self)
- Add to MetaCart
(Show Context)
We describe an approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion. The temporal continuity of the video within a shot is used to track the regions in order to reject unstable regions and reduce the effects of noise in the descriptors. The analogy with text retrieval is in the implementation where matches on descriptors are pre-computed (using vector quantization), and inverted file systems and document rankings are used. The result is that retrieval is immediate, returning a ranked list of key frames/shots in the manner of Google. The method is illustrated for matching on two full length feature films. 1.
An affine invariant interest point detector
- In Proceedings of the 7th European Conference on Computer Vision
, 2002
"... Abstract. This paper presents a novel approach for detecting affine invariant interest points. Our method can deal with significant affine transformations including large scale changes. Such transformations introduce significant changes in the point location as well as in the scale and the shape of ..."
Abstract
-
Cited by 1455 (55 self)
- Add to MetaCart
(Show Context)
Abstract. This paper presents a novel approach for detecting affine invariant interest points. Our method can deal with significant affine transformations including large scale changes. Such transformations introduce significant changes in the point location as well as in the scale and the shape of the neighbourhood of an interest point. Our approach allows to solve for these problems simultaneously. It is based on three key ideas: 1) The second moment matrix computed in a point can be used to normalize a region in an affine invariant way (skew and stretch). 2) The scale of the local structure is indicated by local extrema of normalized derivatives over scale. 3) An affine-adapted Harris detector determines the location of interest points. A multi-scale version of this detector is used for initialization. An iterative algorithm then modifies location, scale and neighbourhood of each point and converges to affine invariant points. For matching and recognition, the image is characterized by a set of affine invariant points; the affine transformation associated with each point allows the computation of an affine invariant descriptor which is also invariant to affine illumination changes. A quantitative comparison of our detector with existing ones shows a significant improvement in the presence of large affine deformations. Experimental results for wide baseline matching show an excellent performance in the presence of large perspective transformations including significant scale changes. Results for recognition are very good for a database with more than 5000 images.
A comparison of affine region detectors
- International Journal of Computer Vision
, 2005
"... The paper gives a snapshot of the state of the art in affine covariant region detectors, and compares their performance on a set of test images under varying imaging conditions. Six types of detectors are included: detectors based on affine normalization around Harris [24, 34] and Hessian points [24 ..."
Abstract
-
Cited by 363 (20 self)
- Add to MetaCart
(Show Context)
The paper gives a snapshot of the state of the art in affine covariant region detectors, and compares their performance on a set of test images under varying imaging conditions. Six types of detectors are included: detectors based on affine normalization around Harris [24, 34] and Hessian points [24], as proposed by Mikolajczyk and Schmid and by Schaffalitzky and Zisserman; a detector of ‘maximally stable extremal regions’, proposed by Matas et al. [21]; an edge-based region detector [45] and a detector based on intensity extrema [47], proposed by Tuytelaars and Van Gool; and a detector of ‘salient regions’, proposed by Kadir, Zisserman and Brady [12]. The performance is measured against changes in viewpoint, scale, illumination, defocus and image compression. The objective of this paper is also to establish a reference test set of images and performance software, so that future detectors can be evaluated in the same framework. 1
Content-based multimedia information retrieval: State of the art and challenges
- ACM Trans. Multimedia Comput. Commun. Appl
, 2006
"... Extending beyond the boundaries of science, art, and culture, content-based multimedia information retrieval provides new paradigms and methods for searching through the myriad variety of media all over the world. This survey reviews 100+ recent articles on content-based multimedia information retri ..."
Abstract
-
Cited by 298 (12 self)
- Add to MetaCart
Extending beyond the boundaries of science, art, and culture, content-based multimedia information retrieval provides new paradigms and methods for searching through the myriad variety of media all over the world. This survey reviews 100+ recent articles on content-based multimedia information retrieval and discusses their role in current research directions which include browsing and search paradigms, user studies, affective computing, learning, semantic queries, new features and media types, high performance indexing, and evaluation techniques. Based on the current state of the art, we discuss the major challenges for the future.
Keypoint recognition using randomized trees
- IEEE Trans. Pattern Anal. Mach. Intell
"... In many 3–D object-detection and pose-estimation problems, run-time performance is of critical importance. However, there usually is time to train the system, which we will show to be very useful. Assuming that several registered images of the target object are available, we developed a keypoint-bas ..."
Abstract
-
Cited by 214 (18 self)
- Add to MetaCart
(Show Context)
In many 3–D object-detection and pose-estimation problems, run-time performance is of critical importance. However, there usually is time to train the system, which we will show to be very useful. Assuming that several registered images of the target object are available, we developed a keypoint-based approach that is effective in this context by formulating wide-baseline matching of keypoints extracted from the input images to those found in the model images as a classification problem. This shifts much of the computational burden to a training phase, without sacrificing recognition performance. As a result, the resulting algorithm is robust, accurate, and fast-enough for frame-rate performance. This reduction in run-time computational complexity is our first contribution. Our second contribution is to show that, in this context, a simple and fast keypoint detector suffices to support detection and tracking even under large perspective and scale variations. While earlier methods require a detector that can be expected to produce very repeatable results in general, which usually is very time-consuming, we simply find the most repeatable object keypoints for the specific target object during the training phase. We have incorporated these ideas into a real-time system that detects planar, non-planar, and deformable objects. It then estimates the pose of the rigid ones and the deformations of the others.
Randomized trees for realtime keypoint recognition
- In Proc. Int. Conf. on Computer Vision and Pattern Recognition (CVPR 2005
, 2005
"... In earlier work, we proposed treating wide baseline matching of feature points as a classification problem, in which each class corresponds to the set of all possible views of such a point. We used a K-mean plus Nearest Neighbor classifier to validate our approach, mostly because it was simple to im ..."
Abstract
-
Cited by 167 (5 self)
- Add to MetaCart
(Show Context)
In earlier work, we proposed treating wide baseline matching of feature points as a classification problem, in which each class corresponds to the set of all possible views of such a point. We used a K-mean plus Nearest Neighbor classifier to validate our approach, mostly because it was simple to implement. It has proved effective but still too slow for real-time use. In this paper, we advocate instead the use of randomized trees as the classification technique. It is both fast enough for real-time performance and more robust. It also gives us a principled way not only to match keypoints but to select during a training phase those that are the most recognizable ones. This results in a real-time system able to detect and position in 3D planar, non-planar, and even deformable objects. It is robust to illuminations changes, scale changes and occlusions. 1.
Invariant Features from Interest Point Groups
- In British Machine Vision Conference
, 2002
"... This paper approaches the problem of finding correspondences between images in which there are large changes in viewpoint, scale and illumination. Recent work has shown that scale-space `interest points' may be found with good repeatability in spite of such changes. Furthermore, the high entrop ..."
Abstract
-
Cited by 153 (2 self)
- Add to MetaCart
(Show Context)
This paper approaches the problem of finding correspondences between images in which there are large changes in viewpoint, scale and illumination. Recent work has shown that scale-space `interest points' may be found with good repeatability in spite of such changes. Furthermore, the high entropy of the surrounding image regions means that local descriptors are highly discriminative for matching. For descriptors at interest points to be robustly matched between images, they must be as far as possible invariant to the imaging process.
Monocular model-based 3d tracking of rigid objects: A survey
- In Foundations and Trends in Computer Graphics and Vision
, 2005
"... Many applications require tracking of complex 3D objects. These include visual servoing of robotic arms on specific target objects, Aug-mented Reality systems that require real-time registration of the object to be augmented, and head tracking systems that sophisticated inter-faces can use. Computer ..."
Abstract
-
Cited by 142 (4 self)
- Add to MetaCart
(Show Context)
Many applications require tracking of complex 3D objects. These include visual servoing of robotic arms on specific target objects, Aug-mented Reality systems that require real-time registration of the object to be augmented, and head tracking systems that sophisticated inter-faces can use. Computer Vision offers solutions that are cheap, practical and non-invasive. This survey reviews the different techniques and approaches that have been developed by industry and research. First, important math-ematical tools are introduced: Camera representation, robust estima-tion and uncertainty estimation. Then a comprehensive study is given of the numerous approaches developed by the Augmented Reality and Robotics communities, beginning with those that are based on point or planar fiducial marks and moving on to those that avoid the need to engineer the environment by relying on natural features such as edges, texture or interest. Recent advances that avoid manual initialization and failures due to fast motion are also presented. The survery con-cludes with the different possible choices that should be made when implementing a 3D tracking system and a discussion of the future of vision-based 3D tracking. Because it encompasses many computer vision techniques from low-level vision to 3D geometry and includes a comprehensive study of the massive literature on the subject, this survey should be the handbook of the student, the researcher, or the engineer who wants to implement a 3D tracking system. 1
Evaluation of features detectors and descriptors based on 3d objects
- IJCV
, 2005
"... We explore the performance of a number of popular feature detectors and descriptors in matching 3D object features across viewpoints and lighting conditions. To this end we design a method, based on intersecting epipolar constraints, for providing ground truth correspondence automatically. We collec ..."
Abstract
-
Cited by 136 (2 self)
- Add to MetaCart
(Show Context)
We explore the performance of a number of popular feature detectors and descriptors in matching 3D object features across viewpoints and lighting conditions. To this end we design a method, based on intersecting epipolar constraints, for providing ground truth correspondence automatically. We collect a database of 100 objects viewed from 144 calibrated viewpoints under three different lighting conditions. We find that the combination of Hessian-affine feature finder and SIFT features is most robust to viewpoint change. Harris-affine combined with SIFT and Hessianaffine combined with shape context descriptors were best respectively for lighting changes and scale changes. We also find that no detector-descriptor combination performs well with viewpoint changes of more than 25-30 ◦. 1
Daisy: An efficient dense descriptor applied to wide baseline stereo
- IEEE TRANS. PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2010
"... In this paper, we introduce a local image descriptor, DAISY, which is very efficient to compute densely. We also present an EM-based algorithm to compute dense depth and occlusion maps from wide-baseline image pairs using this descriptor. This yields much better results in wide-baseline situations t ..."
Abstract
-
Cited by 121 (13 self)
- Add to MetaCart
(Show Context)
In this paper, we introduce a local image descriptor, DAISY, which is very efficient to compute densely. We also present an EM-based algorithm to compute dense depth and occlusion maps from wide-baseline image pairs using this descriptor. This yields much better results in wide-baseline situations than the pixel and correlation-based algorithms that are commonly used in narrowbaseline stereo. Also, using a descriptor makes our algorithm robust against many photometric and geometric transformations. Our descriptor is inspired from earlier ones such as SIFT and GLOH but can be computed much faster for our purposes. Unlike SURF, which can also be computed efficiently at every pixel, it does not introduce artifacts that degrade the matching performance when used densely. It is important to note that our approach is the first algorithm that attempts to estimate dense depth maps from wide-baseline image pairs, and we show that it is a good one at that with many experiments for depth estimation accuracy, occlusion detection, and comparing it against other descriptors on laser-scanned ground truth scenes. We also tested our approach on a variety of indoor and outdoor scenes with different photometric and geometric transformations and our experiments support our claim to being robust against these.