Results 1 - 10
of
14
Multi-scale object detection by clustering lines
- In: ICCV (2009
"... Object detection in cluttered, natural scenes has a high complexity since many local observations compete for object hypotheses. Voting methods provide an efficient solution to this problem. When Hough voting is extended to location and scale, votes naturally become lines through scale space due to ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Object detection in cluttered, natural scenes has a high complexity since many local observations compete for object hypotheses. Voting methods provide an efficient solution to this problem. When Hough voting is extended to location and scale, votes naturally become lines through scale space due to the local scale-location-ambiguity. In contrast to this, current voting methods stick to the location-only setting and cast point votes, which require local estimates of scale. Rather than searching for object hypotheses in the Hough accumulator, we propose a weighted, pairwise clustering of voting lines to obtain globally consistent hypotheses directly. In essence, we propose a hierarchical approach that is based on a sparse representation of object boundary shape. Clustering of voting lines (CVL) condenses the information from these edge points in few, globally consistent candidate hypotheses. A final verification stage concludes by refining the candidates. Experiments on the ETHZ shape dataset show that clustering voting lines significantly improves state-of-the-art Hough voting techniques. 1.
The Evolution of Object Categorization and the Challenge of Image Abstraction
"... Technical University. During my visit, a graduate student was kind enough to show me around Prague, including a visit to the Museum of Modern and Contemporary Art (Veletr˘zní Palác). It was there that I saw the sculpture ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Technical University. During my visit, a graduate student was kind enough to show me around Prague, including a visit to the Museum of Modern and Contemporary Art (Veletr˘zní Palác). It was there that I saw the sculpture
A Multiple Kernel Learning Approach to Joint Multi-class Object Detection
"... Abstract. Most current methods for multi-class object classification and localization work as independent 1-vs-rest classifiers. They decide whether and where an object is visible in an image purely on a per-class basis. Joint learning of more than one object class would generally be preferable, sin ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract. Most current methods for multi-class object classification and localization work as independent 1-vs-rest classifiers. They decide whether and where an object is visible in an image purely on a per-class basis. Joint learning of more than one object class would generally be preferable, since this would allow the use of contextual information such as co-occurrence between classes. However, this approach is usually not employed because of its computational cost. In this paper we propose a method to combine the efficiency of single class localization with a subsequent decision process that works jointly for all given object classes. By following a multiple kernel learning (MKL) approach, we automatically obtain a sparse dependency graph of relevant object classes on which to base the decision. Experiments on the PASCAL VOC 2006 and 2007 datasets show that the subsequent joint decision step clearly improves the accuracy compared to single class detection. 1
Learning Hierarchical Compositional Representations of Object Structure
"... Visual categorization of objects has captured the attention of the vision community for decades [10]. The increased popularity of the problem witnessed in the recent years and the advent of powerful computer hardware have led to a seeming success of categorization approaches on the standard datasets ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Visual categorization of objects has captured the attention of the vision community for decades [10]. The increased popularity of the problem witnessed in the recent years and the advent of powerful computer hardware have led to a seeming success of categorization approaches on the standard datasets such as
Seeing the Objects Behind the Dots: Recognition in Videos from a Moving Camera
- INTERNATIONAL JOURNAL OF COMPUTER VISION
"... Category-level object recognition, segmentation, and tracking in videos becomes highly challenging when applied to sequences from a hand-held camera that features extensive motion and zooming. An additional challenge is then to develop a fully automatic video analysis system that works without manu ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Category-level object recognition, segmentation, and tracking in videos becomes highly challenging when applied to sequences from a hand-held camera that features extensive motion and zooming. An additional challenge is then to develop a fully automatic video analysis system that works without manual initialization of a tracker or other human intervention, both during training and during recognition, despite background clutter and other distracting objects. Moreover, our working hypothesis states that categorylevel recognition is possible based only on an erratic, flickering pattern of interest point locations without extracting additional features. Compositions of these points are then tracked individually by estimating a parametric motion model. Groups of compositions segment a video frame into the various objects that are present and into background clutter. Objects can then be recognized and tracked based on the motion of their compositions and on the shape they form. Finally, the combination of this flow-based representation with an appearance-based one is investigated. Besides evaluating the approach on a challenging video categorization database with significant camera motion and clutter, we also demonstrate that it generalizes to action recognition in a natural way.
Learning Grammatical Models for Object Recognition
, 2008
"... Many object recognition systems are limited by their inability to share common parts or structure among related object classes. This capability is desirable because it allows information about parts and relationships in one object class to be generalized to other classes for which it is relevant. Wi ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Many object recognition systems are limited by their inability to share common parts or structure among related object classes. This capability is desirable because it allows information about parts and relationships in one object class to be generalized to other classes for which it is relevant. With this goal in mind, we have designed a representation and recognition framework that captures structural variability and shared part structure within and among object classes. The framework uses probabilistic geometric grammars (PGGs) to represent object classes recursively in terms of their parts, thereby exploiting the hierarchical and substitutive structure inherent to many types of objects. To incorporate geometric and appearance information, we extend traditional probabilistic context-free grammars to represent distributions over the relative geometric characteristics of object parts as well as the appearance of primitive parts. We describe an e cient dynamic programming algorithm for object categorization and localization in images given a PGG model. We also develop an EM algorithm to estimate the parameters of a grammar structure from training data, and a search-based structure learning approach that nds a compact grammar to explain the image data while sharing substructure among classes. Finally, we describe a set of experiments that demonstrate empirically that the system provides a performance bene t. 1
HERACLES ET AL.: BEHAVIOR PREDICTION IN URBAN TRAFFIC ENVIRONMENTS 1 Vision-Based Behavior Prediction in Urban Traffic Environments by Scene Categorization
"... We propose a method for vision-based scene understanding in urban traffic environments that predicts the appropriate behavior of a human driver in a given visual scene. The method relies on a decomposition of the visual scene into its constituent objects by image segmentation and uses segmentation-b ..."
Abstract
- Add to MetaCart
We propose a method for vision-based scene understanding in urban traffic environments that predicts the appropriate behavior of a human driver in a given visual scene. The method relies on a decomposition of the visual scene into its constituent objects by image segmentation and uses segmentation-based features that represent both their identity and spatial properties. We show how the behavior prediction can be naturally formulated as scene categorization problem and how ground truth behavior data for learning a classifier can be automatically generated from any monocular video sequence recorded from a moving vehicle, using structure from motion techniques. We evaluate our method both quantitatively and qualitatively on the recently proposed CamVid dataset, predicting the appropriate velocity and yaw rate of the car as well as their appropriate change for both day and dusk sequences. In particular, we investigate the impact of the underlying segmentation and the number of behavior classes on the quality of these predictions. 1
3D Face Data for Craniofacial Research
, 2009
"... This is to certify that I have examined this copy of a doctoral dissertation by ..."
Abstract
- Add to MetaCart
This is to certify that I have examined this copy of a doctoral dissertation by
Learning the Compositional Structure of Man-Made Objects for 3D Shape Retrieval
"... While approaches based on local features play a more and more important role for 3D shape retrieval, the problems of feature selection and similarity measurement between sets of local features still remain open tasks. Common algorithms usually measure the similarity between two such sets by either e ..."
Abstract
- Add to MetaCart
While approaches based on local features play a more and more important role for 3D shape retrieval, the problems of feature selection and similarity measurement between sets of local features still remain open tasks. Common algorithms usually measure the similarity between two such sets by either establishing feature correspondences or by using Bag-of-Features (BoF) approaches. While establishing correspondences often involves a lot of manually chosen thresholds, BoF approaches can hardly model the spatial structure of the underlying 3D object. In this paper focusing on retrieval of 3D models representing man-made objects, we try to tackle both of these problems. Exploiting the fact that man-made objects usually consist of a small set of certain shape primitives, we propose a feature selection technique that decomposes 3D point clouds into sections that can be represented by a plane, a sphere, a cylinder, a cone, or a torus. We then introduce a probabilistic framework for analyzing and learning the spatial arrangement of the detected shape primitives with respect to training objects belonging to certain categories. The knowledge acquired in this learning process allows for efficient retrieval and classification of new 3D objects. We finally evaluate our algorithm on the recently introduced 3D Architecture Shape Benchmark, which mainly consists of 3D models representing man-made objects.
Discovering Multipart Appearance Models from Captioned Images
"... Abstract. Even a relatively unstructured captioned image set depicting a variety of objects in cluttered scenes contains strong correlations between caption words and repeated visual structures. We exploit these correlations to discover named objects and learn hierarchical models of their appearance ..."
Abstract
- Add to MetaCart
Abstract. Even a relatively unstructured captioned image set depicting a variety of objects in cluttered scenes contains strong correlations between caption words and repeated visual structures. We exploit these correlations to discover named objects and learn hierarchical models of their appearance. Revising and extending a previous technique for finding small, distinctive configurations of local features, our method assembles these co-occurring parts into graphs with greater spatial extent and flexibility. The resulting multipart appearance models remain scale, translation and rotation invariant, but are more reliable detectors and provide better localization. We demonstrate improved annotation precision and recall on datasets to which the non-hierarchical technique was previously applied and show extended spatial coverage of detected objects. 1

