• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Analyzing appearance and contour based methods for object categorization (2003)

by B Leibe, B Schiele
Venue:in IEEE Conf. Comput
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 219
Next 10 →

A PERFORMANCE EVALUATION OF LOCAL DESCRIPTORS

by Krystian Mikolajczyk, Cordelia Schmid , 2005
"... In this paper we compare the performance of descriptors computed for local interest regions, as for example extracted by the Harris-Affine detector [32]. Many different descriptors have been proposed in the literature. However, it is unclear which descriptors are more appropriate and how their perfo ..."
Abstract - Cited by 1783 (51 self) - Add to MetaCart
In this paper we compare the performance of descriptors computed for local interest regions, as for example extracted by the Harris-Affine detector [32]. Many different descriptors have been proposed in the literature. However, it is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [3], steerable filters [12], PCA-SIFT [19], differential invariants [20], spin images [21], SIFT [26], complex filters [37], moment invariants [43], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor, and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

LabelMe: A Database and Web-Based Tool for Image Annotation

by B. C. Russell, A. Torralba, K. P. Murphy, W. T. Freeman , 2008
"... We seek to build a large collection of images with ground truth labels to be used for object detection and recognition research. Such data is useful for supervised learning and quantitative evaluation. To achieve this, we developed a web-based tool that allows easy image annotation and instant sha ..."
Abstract - Cited by 679 (46 self) - Add to MetaCart
We seek to build a large collection of images with ground truth labels to be used for object detection and recognition research. Such data is useful for supervised learning and quantitative evaluation. To achieve this, we developed a web-based tool that allows easy image annotation and instant sharing of such annotations. Using this annotation tool, we have collected a large dataset that spans many object categories, often containing multiple instances over a wide variety of images. We quantify the contents of the dataset and compare against existing state of the art datasets used for object recognition and detection. Also, we show how to extend the dataset to automatically enhance object labels with WordNet, discover object parts, recover a depth ordering of objects in a scene, and increase the number of labels using minimal user supervision and images from the web.
(Show Context)

Citation Context

...that humans can recognize about 30000 entry-level object categories. Recent work in computer vision has shown impressive results for the detection and recognition of a few different object categories =-=[42, 16, 22]-=-. However, the size and contents of existing datasets, among other factors, limit current methods from scaling to thousands of object categories. Research in object detection and recognition would ben...

Sharing Features: Efficient Boosting Procedures for Multiclass Object Detection

by Antonio Torralba , Kevin P. Murphy, William T. Freeman - IN CVPR , 2004
"... We consider the problem of detecting a large number of different object classes in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, which can be slow and require much training data. We present a multi-class boosting procedure (joint boosting) ..."
Abstract - Cited by 309 (16 self) - Add to MetaCart
We consider the problem of detecting a large number of different object classes in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, which can be slow and require much training data. We present a multi-class boosting procedure (joint boosting) that reduces both the computational and sample complexity, by finding common features that can be shared across the classes. The detectors for each class are trained jointly, rather than independently. For a given performance level, the total number of features required is observed to scale approximately logarithmically with the number of classes. In addition, we find that the features selected by independently trained classifiers are often specific to the class, whereas the features selected by the jointly trained classifiers are more generic features, such as lines and edges.

Sharing Visual Features for Multiclass And Multiview Object Detection

by Antonio Torralba, Kevin P. Murphy, William T. Freeman , 2004
"... We consider the problem of detecting a large number of different classes of objects in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, at multiple locations and scales. This can be slow and can require a lot of training data, since each clas ..."
Abstract - Cited by 279 (6 self) - Add to MetaCart
We consider the problem of detecting a large number of different classes of objects in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, at multiple locations and scales. This can be slow and can require a lot of training data, since each classifier requires the computation of many different image features. In particular, for independently trained detectors, the (run-time) computational complexity, and the (training-time) sample complexity, scales linearly with the number of classes to be detected. It seems unlikely that such an approach will scale up to allow recognition of hundreds or thousands of objects.
(Show Context)

Citation Context

... work on training a single multiclass classifier to distinguish between different classes of object, but this typically assumes that the object has been separated from the background (see e.g., [25], =-=[22]-=-). In this paper [33], we consider the combined problem of distinguishing classes from the background and from each other. This is harder than standard multiclass isolated object classification proble...

Learning methods for generic object recognition with invariance to pose and lighting

by Yann Lecun, Fu Jie Huang, Léon Bottou - In Proceedings of CVPR’04 , 2004
"... We assess the applicability of several popular learning methods for the problem of recognizing generic visual categories with invariance to pose, lighting, and surrounding clutter. A large dataset comprising stereo image pairs of 50 uniform-colored toys under 36 angles, 9 azimuths, and 6 lighting co ..."
Abstract - Cited by 253 (18 self) - Add to MetaCart
We assess the applicability of several popular learning methods for the problem of recognizing generic visual categories with invariance to pose, lighting, and surrounding clutter. A large dataset comprising stereo image pairs of 50 uniform-colored toys under 36 angles, 9 azimuths, and 6 lighting conditions was collected (for a total of 194,400 individual images). The objects were 10 instances of 5 generic categories: four-legged animals, human figures, airplanes, trucks, and cars. Five instances of each category were used for training, and the other five for testing. Low-resolution grayscale images of the objects with various amounts of variability and surrounding clutter were used for training and testing. Nearest Neighbor methods, Support Vector Machines, and Convolutional Networks, operating on raw pixels or on PCA-derived features were tested. Test error rates for unseen object instances placed on uniform backgrounds were around 13 % for SVM and 7 % for Convolutional Nets. On a segmentation/recognition task with highly cluttered images, SVM proved impractical, while Convolutional nets yielded 14 % error. A real-time version of the system was implemented that can detect and classify objects in natural scenes at around 10 frames per second. 1
(Show Context)

Citation Context

...uthors have advocated the use of color, texture, and contours for image indexing applications [8], the detection of distinctive local features [20, 26, 25, 23], the use of global appearance templates =-=[11, 10, 19]-=-, the extraction of silhouettes and edge information [14, 22, 8, 4, 19] and the use of pose-invariant feature histograms [9, 5, 1].sConversely, learning-based methods operating on raw pixels or low-le...

Computer Vision: Algorithms and Applications

by Richard Szeliski , 2010
"... ..."
Abstract - Cited by 252 (2 self) - Add to MetaCart
Abstract not found

Recognition with Local Features: The Kernel Recipe

by Christian Wallraven , 2003
"... Recent developments in computer vision have shown that local features can provide efficient representations suitable for robust object recognition. Support Vector Machines have been established as powerful learning algorithms with good generalization capabilities. In this paper, we combine these two ..."
Abstract - Cited by 178 (26 self) - Add to MetaCart
Recent developments in computer vision have shown that local features can provide efficient representations suitable for robust object recognition. Support Vector Machines have been established as powerful learning algorithms with good generalization capabilities. In this paper, we combine these two approaches and propose a general kernel method for recognition with local features. We show that the proposed kernel satisfies the Mercer condition and that it is suitable for many established local feature frameworks. Large-scale recognition results are presented on three different databases, which demonstrate that SVMs with the proposed kernel perform better than standard matching techniques on local features. In addition, experiments on noisy and occluded images show that local feature representations significantly outperform global approaches. 1.
(Show Context)

Citation Context

...bject, resulting in a view every 20 ◦ . Proceedings of the Ninth IEEE International Conference on Computer Vision (ICCV 2003) 2-Volume Set 0-7695-1950-4/03 $17.00 © 2003 IEEE The COGVIS-ETH database (=-=[13]-=-, Figure 1, middle row) is a recently released database, consisting of 80 objects from 8 different categories (apple, tomato, pear, toy-cows, toyhorses, toy-dogs, toy-cars and cups). Each object is re...

Shape classification using the inner-distance

by Haibin Ling, David W. Jacobs - PATTERN ANALYSIS AND MACHINE INTELLIGENCE, IEEE TRANSACTIONS ON , 2007
"... Part structure and articulation are of fundamental importance in computer and human vision. We propose using the innerdistance to build shape descriptors that are robust to articulation and capture part structure. The inner-distance is defined as the length of the shortest path between landmark poin ..."
Abstract - Cited by 174 (7 self) - Add to MetaCart
Part structure and articulation are of fundamental importance in computer and human vision. We propose using the innerdistance to build shape descriptors that are robust to articulation and capture part structure. The inner-distance is defined as the length of the shortest path between landmark points within the shape silhouette. We show that it is articulation insensitive and more effective at capturing part structures than the Euclidean distance. This suggests that the inner-distance can be used as a replacement for the Euclidean distance to build more accurate descriptors for complex shapes, especially for those with articulated parts. In addition, texture information along the shortest path can be used to further improve shape classification. With this idea, we propose three approaches to using the inner-distance. The first method combines the inner-distance and multidimensional scaling (MDS) to build articulation invariant signatures for articulated shapes. The second method uses the inner-distance to build a new shape descriptor based on shape contexts. The third one extends the second one by considering the texture information along shortest paths. The proposed approaches have been tested on a variety of shape databases, including an articulated shape data set, MPEG7 CE-Shape-1, Kimia silhouettes, the ETH-80 data set, two leaf data sets, and a human motion silhouette data set. In all the experiments, our methods demonstrate effective performance compared with other algorithms.

Interleaved object categorization and segmentation

by Bastian Leibe, Bernt Schiele - In BMVC , 2003
"... Historically, figure-ground segmentation has been seen as an important and even necessary precursor for object recognition. In that context, segmentation is mostly defined as a data driven, that is bottom-up, process. As for humans object recognition and segmentation are heavily intertwined processe ..."
Abstract - Cited by 162 (8 self) - Add to MetaCart
Historically, figure-ground segmentation has been seen as an important and even necessary precursor for object recognition. In that context, segmentation is mostly defined as a data driven, that is bottom-up, process. As for humans object recognition and segmentation are heavily intertwined processes, it has been argued that top-down knowledge from object recognition can and should be used for guiding the segmentation process. In this paper, we present a method for the categorization of unfamiliar objects in difficult real-world scenes. The method generates object hypotheses without prior segmentation that can be used to obtain a category-specific figure-ground segmentation. In particular, the proposed approach uses a probabilistic formulation to incorporate knowledge about the recognized category as well as the supporting information in the image to segment the object from the background. This segmentation can then be used for hypothesis verification, to further improve recognition performance. Experimental results show the capacity of the approach to categorize and segment object categories as diverse as cars and cows. 1
(Show Context)

Citation Context

...section describes how our algorithm achieves this by learning a codebook of local appearance.s(a) (b) (c) Figure 1: (a,b) Some of the training objects used for cows and cars (from the ETH-80 database =-=[8]-=-). From each object, 16 views were taken from different orientations. (c) Example codebook clusters for cars with their corresponding patches. 3 A Codebook of Local Appearance for Object Categorizatio...

A visual category filter for google images,” in

by R Fergus, P Perona, A Zisserman - Proc. of European Conference on Computer Vision, , 2004
"... ..."
Abstract - Cited by 138 (12 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...portantly, in the image search scenario the object is actually only present in a sub-set of the images, and this sub-set (and even its size) is unknown. While methods exist to model object categories =-=[9, 13, 15]-=-, it is essential that the approach can learn from a contaminated training set with a minimal amount of supervision. We therefore use the method of Fergus et al. [10], extending it to allow the parts ...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University