Results 1 - 10
of
29
Learning the discriminative powerinvariance trade-off
- In ICCV
, 2007
"... We investigate the problem of learning optimal descriptors for a given classification task. Many hand-crafted descriptors have been proposed in the literature for measuring visual similarity. Looking past initial differences, what really distinguishes one descriptor from another is the tradeoff that ..."
Abstract
-
Cited by 80 (3 self)
- Add to MetaCart
We investigate the problem of learning optimal descriptors for a given classification task. Many hand-crafted descriptors have been proposed in the literature for measuring visual similarity. Looking past initial differences, what really distinguishes one descriptor from another is the tradeoff that it achieves between discriminative power and invariance. Since this trade-off must vary from task to task, no single descriptor can be optimal in all situations. Our focus, in this paper, is on learning the optimal tradeoff for classification given a particular training set and prior constraints. The problem is posed in the kernel learning framework. We learn the optimal, domain-specific kernel as a combination of base kernels corresponding to base features which achieve different levels of trade-off (such as no invariance, rotation invariance, scale invariance, affine invariance, etc.) This leads to a convex optimisation problem with a unique global optimum which can be solved for efficiently. The method is shown to achieve state-of-the-art performance on the UIUC textures, Oxford flowers and Caltech 101 datasets. 1.
On feature combination for multiclass object classication
- In ICCV
"... A key ingredient in the design of visual object classification systems is the identification of relevant class specific aspects while being robust to intra-class variations. While this is a necessity in order to generalize beyond a given set of training images, it is also a very difficult problem du ..."
Abstract
-
Cited by 47 (1 self)
- Add to MetaCart
A key ingredient in the design of visual object classification systems is the identification of relevant class specific aspects while being robust to intra-class variations. While this is a necessity in order to generalize beyond a given set of training images, it is also a very difficult problem due to the high variability of visual appearance within each class. In the last years substantial performance gains on challenging benchmark datasets have been reported in the literature. This progress can be attributed to two developments: the design of highly discriminative and robust image features and the combination of multiple complementary features based on different aspects such as shape, color or texture. In this paper we study several models that aim at learning the correct weighting of different features from training data. These include multiple kernel learning as well as simple baseline methods. Furthermore we derive ensemble methods inspired by Boosting which are easily extendable to several multiclass setting. All methods are thoroughly evaluated on object classification datasets using a multitude of feature descriptors. The key results are that even very simple baseline methods, that are orders of magnitude faster than learning techniques are highly competitive with multiple kernel learning. Furthermore the Boosting type methods are found to produce consistently better results in all experiments. We provide insight of when combination methods can be expected to work and how the benefit of complementary features can be exploited most efficiently.
A fast and incremental method for loop-closure detection using bags of visual words,” Conditionally accpeted for publication in
- IEEE Transactions On Robotics, Special Issue on Visual SLAM
, 2008
"... Abstract—In robotic applications of visual simultaneous localization and mapping techniques, loop-closure detection and global localization are two issues that require the capacity to recognize a previously visited place from current camera measurements. We present an online method that makes it pos ..."
Abstract
-
Cited by 21 (5 self)
- Add to MetaCart
Abstract—In robotic applications of visual simultaneous localization and mapping techniques, loop-closure detection and global localization are two issues that require the capacity to recognize a previously visited place from current camera measurements. We present an online method that makes it possible to detect when an image comes from an already perceived scene using local shape and color information. Our approach extends the bag-of-words method used in image classification to incremental conditions and relies on Bayesian filtering to estimate loop-closure probability. We demonstrate the efficiency of our solution by real-time loop-closure detection under strong perceptual aliasing conditions in both indoor and outdoor image sequences taken with a handheld camera. Index Terms—Loop-closure detection, localization, SLAM. I.
What does classifying more than 10,000 image categories tell us?
"... Image classification is a critical task for both humans and computers. One of the challenges lies in the large scale of the semantic space. In particular, humans can recognize tens of thousands of object classes and scenes. No computer vision algorithm today has been tested at this scale. This pape ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Image classification is a critical task for both humans and computers. One of the challenges lies in the large scale of the semantic space. In particular, humans can recognize tens of thousands of object classes and scenes. No computer vision algorithm today has been tested at this scale. This paper presents a study of large scale categorization including a series of challenging experiments on classification with more than 10, 000 image classes. We find that a) computational issues become crucial in algorithm design; b) conventional wisdom from a couple of hundred image categories on relative performance of different classifiers does not necessarily hold when the number of categories increases; c) there is a surprisingly strong relationship between the structure of WordNet (developed for studying language) and the difficulty of visual categorization; d) classification can be improved by exploiting the semantic hierarchy. Toward the future goal of developing automatic vision algorithms to recognize tens of thousands or even millions of image categories, we make a series of observations and arguments about dataset scale, category density, and image hierarchy.
Real-time visual loop-closure detection
- in IEEE International Conference on Robotics and Automation (ICRA
, 2008
"... Abstract — In robotic applications of visual simultaneous localization and mapping, loop-closure detection and global localization are two issues that require the capacity to recognize a previously visited place from current camera measurements. We present an online method that makes it possible to ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
Abstract — In robotic applications of visual simultaneous localization and mapping, loop-closure detection and global localization are two issues that require the capacity to recognize a previously visited place from current camera measurements. We present an online method that makes it possible to detect when an image comes from an already perceived scene using local shape information. Our approach extends the bag of visual words method used in image recognition to incremental conditions and relies on Bayesian filtering to estimate loopclosure probability. We demonstrate the efficiency of our solution by real-time loop-closure detection under strong perceptual aliasing conditions in an indoor image sequence taken with a handheld camera. I.
On the algorithmics and applications of a mixed-norm based kernel learning formulation
- In Advances in Neural Information Processing Systems
, 2009
"... Motivated from real world problems, like object categorization, we study a particular mixed-norm regularization for Multiple Kernel Learning (MKL). It is assumed that the given set of kernels are grouped into distinct components where each component is crucial for the learning task at hand. The form ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Motivated from real world problems, like object categorization, we study a particular mixed-norm regularization for Multiple Kernel Learning (MKL). It is assumed that the given set of kernels are grouped into distinct components where each component is crucial for the learning task at hand. The formulation hence employs l ∞ regularization for promoting combinations at the component level and l1 regularization for promoting sparsity among kernels in each component. While previous attempts have formulated this as a non-convex problem, the formulation given here is an instance of non-smooth convex optimization problem which admits an efficient Mirror-Descent (MD) based procedure. The MD procedure optimizes over product of simplexes, which is not a well-studied case in literature. Results on real-world datasets show that the new MKL formulation is well-suited for object categorization tasks and that the MD based algorithm outperforms stateof-the-art MKL solvers like simpleMKL in terms of computational effort. 1
A quasi-random sampling approach to image retrieval
- Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, 2008
"... In this paper, we present a novel approach to contentsbased image retrieval. The method hinges in the use of quasi-random sampling to retrieve those images in a database which are related to a query image provided by the user. Departing from random sampling theory, we make use of the EM algorithm so ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In this paper, we present a novel approach to contentsbased image retrieval. The method hinges in the use of quasi-random sampling to retrieve those images in a database which are related to a query image provided by the user. Departing from random sampling theory, we make use of the EM algorithm so as to organize the images in the database into compact clusters that can then be used for stratified random sampling. For the purposes of retrieval, we use the similarity between the query and the clustered images to govern the sampling process within clusters. In this way, the sampling can be viewed as a stratified sampling one which is random at the cluster level and takes into account the intra-cluster structure of the dataset. This approach leads to a measure of statistical confidence that relates to the theoretical hard-limit of the retrieval performance. We show results on the Oxford Flowers dataset. 1.
Searching the World’s Herbaria: A System for Visual Identification of Plant Species
"... Abstract. We describe a working computer vision system that aids in the identification of plant species. A user photographs an isolated leaf on a blank background, and the system extracts the leaf shape and matches it to the shape of leaves of known species. In a few seconds, the system displays the ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract. We describe a working computer vision system that aids in the identification of plant species. A user photographs an isolated leaf on a blank background, and the system extracts the leaf shape and matches it to the shape of leaves of known species. In a few seconds, the system displays the top matching species, along with textual descriptions and additional images. This system is currently in use by botanists at the Smithsonian Institution National Museum of Natural History. The primary contributions of this paper are: a description of a working computer vision system and its user interface for an important new application area; the introduction of three new datasets containing thousands of single leaf images, each labeled by species and verified by botanists at the US National Herbarium; recognition results for two of the three leaf datasets; and descriptions throughout of practical lessons learned in constructing this system. 1
Use bin-ratio information for category and scene classification
- In CVPR
, 2010
"... In this paper we propose using bin-ratio information, which is collected from the ratios between bin values of histograms, for scene and category classification. To use such information, a new histogram dissimilarity, bin-ratio dissimilarity (BRD), is designed. We show that BRD provides ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
In this paper we propose using bin-ratio information, which is collected from the ratios between bin values of histograms, for scene and category classification. To use such information, a new histogram dissimilarity, bin-ratio dissimilarity (BRD), is designed. We show that BRD provides
Delving into the whorl of flower segmentation
"... We describe an algorithm for automatically segmenting flowers in colour photographs. This is a challenging problem because of the sheer variety of flower classes, the intra-class variability, the variation within a particular flower, and the variability of imaging conditions – lighting, pose, foresh ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We describe an algorithm for automatically segmenting flowers in colour photographs. This is a challenging problem because of the sheer variety of flower classes, the intra-class variability, the variation within a particular flower, and the variability of imaging conditions – lighting, pose, foreshortening etc. The method couples two models – a colour model for foreground and background, and a generic shape model for the petal structure. This shape model is tolerant to viewpoint changes and petal deformations, and applicable across many different flower classes. The segmentations are produced using a MRF cost function optimized using graph cuts. The algorithm is tested on 13 flower classes and more than 750 examples. Performance is assessed against ground truth segmentations. 1

