Results 1 - 10
of
14
Class-specific hough forests for object detection
- In Proceedings IEEE Conference Computer Vision and Pattern Recognition
, 2009
"... We present a method for the detection of instances of an object class, such as cars or pedestrians, in natural images. Similarly to some previous works, this is accomplished via generalized Hough transform, where the detections of individual object parts cast probabilistic votes for possible locatio ..."
Abstract
-
Cited by 24 (10 self)
- Add to MetaCart
We present a method for the detection of instances of an object class, such as cars or pedestrians, in natural images. Similarly to some previous works, this is accomplished via generalized Hough transform, where the detections of individual object parts cast probabilistic votes for possible locations of the centroid of the whole object; the detection hypotheses then correspond to the maxima of the Hough image that accumulates the votes from all parts. However, whereas the previous methods detect object parts using generative codebooks of part appearances, we take a more discriminative approach to object part detection. Towards this end, we train a class-specific Hough forest, which is a random forest that directly maps the image patch appearance to the probabilistic vote about the possible location of the object centroid. We demonstrate that Hough forests improve the results of the Hough-transform object detection significantly and achieve state-of-the-art performance for several classes and datasets. 1.
A Hough transform-based voting framework for action recognition
- IN: CVPR
, 2010
"... We present a method to classify and localize human actions in video using a Hough transform voting framework. Random trees are trained to learn a mapping between densely-sampled feature patches and their corresponding votes in a spatio-temporal-action Hough space. The leaves of the trees form a disc ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
We present a method to classify and localize human actions in video using a Hough transform voting framework. Random trees are trained to learn a mapping between densely-sampled feature patches and their corresponding votes in a spatio-temporal-action Hough space. The leaves of the trees form a discriminative multi-class codebook that share features between the action classes and vote for action centers in a probabilistic manner. Using low-level features such as gradients and optical flow, we demonstrate that Hough-voting can achieve state-of-the-art performance on several datasets covering a wide range of action-recognition scenarios.
Shared Parts for Deformable Part-based Models
"... The deformable part-based model (DPM) proposed by Felzenszwalb et al. has demonstrated state-of-the-art results in object localization. The model offers a high degree of learnt invariance by utilizing viewpoint-dependent mixture components and movable parts in each mixture component. One might hope ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
The deformable part-based model (DPM) proposed by Felzenszwalb et al. has demonstrated state-of-the-art results in object localization. The model offers a high degree of learnt invariance by utilizing viewpoint-dependent mixture components and movable parts in each mixture component. One might hope to increase the accuracy of the DPM by increasing the number of mixture components and parts to give a more faithful model, but limited training data prevents this from being effective. We propose an extension to the DPM which allows for sharing of object part models among multiple mixture components as well as object classes. This results in more compact models and allows training examples to be shared by multiple components, ameliorating the effect of a limited size training set. We (i) reformulate the DPM to incorporate part sharing, and (ii) propose a novel energy function allowing for coupled training of mixture components and object classes. We report state-of-the-art results on the PASCAL VOC dataset. 1.
Learning Mixed Templates for Object Recognition
"... This article proposes a method for learning object templates composed of local sketches and local textures, and investigates the relative importance of the sketches and textures for different object categories. Local sketches and local textures in the object templates account for shapes and appearan ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This article proposes a method for learning object templates composed of local sketches and local textures, and investigates the relative importance of the sketches and textures for different object categories. Local sketches and local textures in the object templates account for shapes and appearances respectively. Both local sketches and local textures are extracted from the maps of Gabor filter responses. The local sketches are captured by the local maxima of Gabor responses, where the local maximum pooling accounts for shape deformations in objects. The local textures are captured by the local averages of Gabor filter responses, where the local average pooling extracts texture information for appearances. The selection of local sketch variables and local texture variables can be accomplished by a projection pursuit type of learning process, where both types of variables can be compared and merged within a common framework. The learning process returns a generative model for image intensities from a relatively small number of training images. The recognition or classification by template matching can then be based on log-likelihood ratio scores. We apply the learning method to a variety of object and texture categories. The results show that both the sketches and textures are useful for classification, and they complement each other. 1.
A new distance for scale-invariant 3D shape recognition and registration
- In Proceedings of ICCV
, 2011
"... This paper presents a method for vote-based 3D shape recognition and registration, in particular using mean shift on 3D pose votes in the space of direct similarity transforms for the first time. We introduce a new distance between poses in this space—the SRT distance. It is left-invariant, unlike E ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper presents a method for vote-based 3D shape recognition and registration, in particular using mean shift on 3D pose votes in the space of direct similarity transforms for the first time. We introduce a new distance between poses in this space—the SRT distance. It is left-invariant, unlike Euclidean distance, and has a unique, closed-form mean, in contrast to Riemannian distance, so is fast to compute. We demonstrate improved performance over the state of the art in both recognition and registration on a real and challenging dataset, by comparing our distance with others in a mean shift framework, as well as with the commonly used Hough voting approach. 1.
Backprojection Revisited: Scalable Multi-view Object Detection and Similarity Metrics for Detections
"... Abstract. Hough transform based object detectors learn a mapping from the image domain to a Hough voting space. Within this space, object hypotheses are formed by local maxima. The votes contributing to a hypothesis are called support. In this work, we investigate the use of the support and its back ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. Hough transform based object detectors learn a mapping from the image domain to a Hough voting space. Within this space, object hypotheses are formed by local maxima. The votes contributing to a hypothesis are called support. In this work, we investigate the use of the support and its backprojection to the image domain for multi-view object detection. To this end, we create a shared codebook with training and matching complexities independent of the number of quantized views. We show that since backprojection encodes enough information about the viewpoint all views can be handled together. In our experiments, we demonstrate that superior accuracy and efficiency can be achieved in comparison to the popular one-vs-the-rest detectors by treating views jointly especially with few training examples and no view annotations. Furthermore, we go beyond the detection case and based on the support we introduce a part-based similarity measure between two arbitrary detections which naturally takes spatial relationships of parts into account and is insensitive to partial occlusions. We also show that backprojection can be used to efficiently measure the similarity of a detection to all training examples. Finally, we demonstrate how these metrics can be used to estimate continuous object parameters like human pose and object’s viewpoint. In our experiment, we achieve state-of-the-art performance for view-classification on the PASCAL VOC’06 dataset. 1
Learning Hybrid Image Templates (HIT) by Information Projection
- FOR REVIEW: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
"... This paper presents a novel framework for learning a generative image representation – the hybrid image template (HIT) from a small number (i.e, 3 ∼ 20) of image examples. Each learned template is composed of, typically, 50 ∼ 500 image patches whose geometric attributes (location, scale, orientatio ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper presents a novel framework for learning a generative image representation – the hybrid image template (HIT) from a small number (i.e, 3 ∼ 20) of image examples. Each learned template is composed of, typically, 50 ∼ 500 image patches whose geometric attributes (location, scale, orientation) may adapt in a local neighborhood for deformation, and whose appearances are characterized respectively by four types of descriptors: local sketch (edge or bar), texture gradients with orientations, flatness regions, and colors. These heterogeneous patches are automatically ranked and selected from a large pool according to their information gains using an information projection framework. Intuitively, a patch has a higher information gain if (i) its feature statistics is consistent within the training examples and is distinctive from the statistics of negative examples (i.e. generic images or examples from other categories); and (ii) its feature statistics has less intra-class variations. The learning process pursues the most informative (for either generative or discriminative purpose) patches one at a time and stops when the information gain is within statistical fluctuation. The template is associated with a well-normalized probability model that integrates the heterogeneous feature statistics. This automated feature selection procedure allows our algorithm to scale up to a wide range of image categories, from those with regular shapes to those with stochastic texture. The learned representation captures the intrinsic characteristics of the object or scene categories. We evaluate the hybrid image templates on several public benchmarks, and demonstrate classification performances on par with state-of-art methods like HoG+SVM, and when small training sample sizes are used the proposed system shows a clear advantage.
Affordance Prediction via Learned Object Attributes
"... Abstract — We present a novel method for learning and predicting the affordances of an object based on its physical and visual attributes. Affordance prediction is a key task in autonomous robot learning, as it allows a robot to reason about the actions it can perform in order to accomplish its goal ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract — We present a novel method for learning and predicting the affordances of an object based on its physical and visual attributes. Affordance prediction is a key task in autonomous robot learning, as it allows a robot to reason about the actions it can perform in order to accomplish its goals. Previous approaches to affordance prediction have either learned direct mappings from visual features to affordances, or have introduced object categories as an intermediate representation. In this paper, we argue that physical and visual attributes provide a more appropriate mid-level representation for affordance prediction, because they support informationsharing between affordances and objects, resulting in superior generalization performance. In particular, affordances are more likely to be correlated with the attributes of an object than they are with its visual appearance or a linguistically-derived object category. We provide preliminary validation of our method experimentally, and present empirical comparisons to both the direct and category-based approaches of affordance prediction. Our encouraging results suggest the promise of the attributebased approach to affordance prediction. I.
Directeur de stage: Jean-Yves Audibert (audibert at imagine.enpc.fr)
"... Proposition de stage: Segmentation d’un objet dans les images numériques Thématique: “Machine Learning ” et traitement d’images Durée: 4-6 mois ..."
Abstract
- Add to MetaCart
Proposition de stage: Segmentation d’un objet dans les images numériques Thématique: “Machine Learning ” et traitement d’images Durée: 4-6 mois
Evaluating multi-class learning strategies in a hierarchical
"... framework for object detection ..."

