Results 1 - 10
of
897
Beyond sliding windows: Object localization by efficient subwindow search
- In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR
, 2008
"... Most successful object recognition systems rely on binary classification, deciding only if an object is present or not, but not providing information on the actual object location. To perform localization, one can take a sliding window approach, but this strongly increases the computational cost, be ..."
Abstract
-
Cited by 224 (11 self)
- Add to MetaCart
(Show Context)
Most successful object recognition systems rely on binary classification, deciding only if an object is present or not, but not providing information on the actual object location. To perform localization, one can take a sliding window approach, but this strongly increases the computational cost, because the classifier function has to be evaluated over a large set of candidate subwindows. In this paper, we propose a simple yet powerful branchand-bound scheme that allows efficient maximization of a large class of classifier functions over all possible subimages. It converges to a globally optimal solution typically in sublinear time. We show how our method is applicable to different object detection and retrieval scenarios. The achieved speedup allows the use of classifiers for localization that formerly were considered too slow for this task, such as SVMs with a spatial pyramid kernel or nearest neighbor classifiers based on the χ 2-distance. We demonstrate state-of-the-art performance of the resulting systems on the UIUC Cars dataset, the PASCAL VOC 2006 dataset and in the PASCAL VOC 2007 competition. 1.
DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition
"... We evaluate whether features extracted from the activation of a deep convolutional network trained in a fully supervised fashion on a large, fixed set of object recognition tasks can be repurposed to novel generic tasks. Our generic tasks may differ significantly from the originally trained tasks an ..."
Abstract
-
Cited by 203 (22 self)
- Add to MetaCart
(Show Context)
We evaluate whether features extracted from the activation of a deep convolutional network trained in a fully supervised fashion on a large, fixed set of object recognition tasks can be repurposed to novel generic tasks. Our generic tasks may differ significantly from the originally trained tasks and there may be insufficient labeled or unlabeled data to conventionally train or adapt a deep architecture to the new tasks. We investigate and visualize the semantic clustering of deep convolutional features with respect to a variety of such tasks, including scene recognition, domain adaptation, and fine-grained recognition challenges. We compare the efficacy of relying on various network levels to define a fixed feature, and report novel results that significantly outperform the state-of-the-art on several important vision challenges. We are releasing DeCAF, an open-source implementation of these deep convolutional activation features, along with all associated network parameters to enable vision researchers to be able to conduct experimentation with deep representations across a range of visual concept learning paradigms. 1.
FAB-MAP: Probabilistic localization and mapping in the space of appearance
- JOURNAL OF FIELD ROBOTICS DOI 10.1002/ROB 914 • JOURNAL OF FIELD ROBOTICS—2009 APPEARANCE. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH
, 2008
"... This paper describes a probabilistic approach to the problem of recognizing places based on their appearance. The system we present is not limited to localization, but can determine that a new observation comes from a previously unseen place, and so augment its map. Effectively this is a SLAM system ..."
Abstract
-
Cited by 186 (13 self)
- Add to MetaCart
This paper describes a probabilistic approach to the problem of recognizing places based on their appearance. The system we present is not limited to localization, but can determine that a new observation comes from a previously unseen place, and so augment its map. Effectively this is a SLAM system in the space of appearance. Our probabilistic approach allows us to explicitly account for perceptual aliasing in the environment—identical but indistinctive observations receive a low probability of having come from the same place. We achieve this by learning a generative model of place appearance. By partitioning the learning problem into two parts, new place models can be learned online from only a single observation of a place. The algorithm complexity is linear in the number of places in the map, and is particularly suitable for online loop closure detection in mo-bile robotics.
Make3D: Learning 3D Scene Structure from a Single Still Image
"... We consider the problem of estimating detailed 3-d structure from a single still image of an unstructured environment. Our goal is to create 3-d models which are both quantitatively accurate as well as visually pleasing. For each small homogeneous patch in the image, we use a Markov Random Field (M ..."
Abstract
-
Cited by 158 (19 self)
- Add to MetaCart
We consider the problem of estimating detailed 3-d structure from a single still image of an unstructured environment. Our goal is to create 3-d models which are both quantitatively accurate as well as visually pleasing. For each small homogeneous patch in the image, we use a Markov Random Field (MRF) to infer a set of “plane parameters” that capture both the 3-d location and 3-d orientation of the patch. The MRF, trained via supervised learning, models both image depth cues as well as the relationships between different parts of the image. Other than assuming that the environment is made up of a number of small planes, our model makes no explicit assumptions about the structure of the scene; this enables the algorithm to capture much more detailed 3-d structure than does prior art, and also give a much richer experience in the 3-d flythroughs created using image-based rendering, even for scenes with significant non-vertical structure. Using this approach, we have created qualitatively correct 3-d models for 64.9 % of 588 images downloaded from the internet. We have also extended our model to produce large scale 3d models from a few images.
M.: Person re-identification by symmetry-driven accumulation of local features
- In: IEEE Conf. Computer Vision and Pattern Recognition
, 2010
"... In this paper, we present an appearance-based method for person re-identification. It consists in the extraction of features that model three complementary aspects of the human appearance: the overall chromatic content, the spatial arrangement of colors into stable regions, and the presence of recur ..."
Abstract
-
Cited by 152 (6 self)
- Add to MetaCart
(Show Context)
In this paper, we present an appearance-based method for person re-identification. It consists in the extraction of features that model three complementary aspects of the human appearance: the overall chromatic content, the spatial arrangement of colors into stable regions, and the presence of recurrent local motifs with high entropy. All this information is derived from different body parts, and weighted opportunely by exploiting symmetry and asymmetry perceptual principles. In this way, robustness against very low resolution, occlusions and pose, viewpoint and illumination changes is achieved. The approach applies to situations where the number of candidates varies continuously, considering single images or bunch of frames for each individual. It has been tested on several public benchmark datasets (ViPER, iLIDS, ETHZ), gaining new state-of-the-art performances. 1.
Brisk: Binary robust invariant scalable keypoints
- In Computer Vision (ICCV), 2011 IEEE International Conference on
, 2011
"... Effective and efficient generation of keypoints from an image is a well-studied problem in the literature and forms the basis of numerous Computer Vision applications. Established leaders in the field are the SIFT and SURF algorithms which exhibit great performance under a variety of image transform ..."
Abstract
-
Cited by 126 (1 self)
- Add to MetaCart
(Show Context)
Effective and efficient generation of keypoints from an image is a well-studied problem in the literature and forms the basis of numerous Computer Vision applications. Established leaders in the field are the SIFT and SURF algorithms which exhibit great performance under a variety of image transformations, with SURF in particular considered as the most computationally efficient amongst the highperformance methods to date. In this paper we propose BRISK 1, a novel method for keypoint detection, description and matching. A comprehensive evaluation on benchmark datasets reveals BRISK’s adaptive, high quality performance as in state-of-the-art algorithms, albeit at a dramatically lower computational cost (an order of magnitude faster than SURF in cases). The key to speed lies in the application of a novel scale-space FAST-based detector in combination with the assembly of a bit-string descriptor from intensity comparisons retrieved by dedicated sampling of each keypoint neighborhood. 1.
Dense trajectories and motion boundary descriptors for action recognition
- INT. J. COMPUT. VISION
, 2013
"... ..."
Pose Tracking from Natural Features on Mobile Phones
"... In this paper we present two techniques for natural feature tracking in real-time on mobile phones. We achieve interactive frame rates of up to 20Hz for natural feature tracking from textured planar targets on current-generation phones. We use an approach based on heavily modified state-of-the-art f ..."
Abstract
-
Cited by 117 (17 self)
- Add to MetaCart
(Show Context)
In this paper we present two techniques for natural feature tracking in real-time on mobile phones. We achieve interactive frame rates of up to 20Hz for natural feature tracking from textured planar targets on current-generation phones. We use an approach based on heavily modified state-of-the-art feature descriptors, namely SIFT and Ferns. While SIFT is known to be a strong, but computationally expensive feature descriptor, Ferns classification is fast, but requires large amounts of memory. This renders both original designs unsuitable for mobile phones. We give detailed descriptions on how we modified both approaches to make them suitable for mobile phones. We present evaluations on robustness and performance on various devices and finally discuss their appropriateness for Augmented Reality applications.
Image alignment and stitching: a tutorial
, 2006
"... This tutorial reviews image alignment and image stitching algorithms. Image alignment algorithms can discover the correspondence relationships among images with varying degrees of overlap. They are ideally suited for applications such as video stabilization, summarization, and the creation of panora ..."
Abstract
-
Cited by 115 (2 self)
- Add to MetaCart
This tutorial reviews image alignment and image stitching algorithms. Image alignment algorithms can discover the correspondence relationships among images with varying degrees of overlap. They are ideally suited for applications such as video stabilization, summarization, and the creation of panoramic mosaics. Image stitching algorithms take the alignment estimates produced by such registration algorithms and blend the images in a seamless manner, taking care to deal with potential problems such as blurring or ghosting caused by parallax and scene movement as well as varying image exposures. This tutorial reviews the basic motion models underlying alignment and stitching algorithms, describes effective direct (pixel-based) and feature-based alignment algorithms, and describes blending algorithms used to produce