Results 1 - 10
of
22
P.: On detection of multiple object instances using hough transforms
, 2010
"... To detect multiple objects of interest, the methods based on Hough transform use non-maxima supression or mode seeking in order to locate and to distinguish peaks in Hough images. Such postprocessing requires tuning of extra parameters and is often fragile, especially when objects of interest tend t ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
To detect multiple objects of interest, the methods based on Hough transform use non-maxima supression or mode seeking in order to locate and to distinguish peaks in Hough images. Such postprocessing requires tuning of extra parameters and is often fragile, especially when objects of interest tend to be closely located. In the paper, we develop a new probabilistic framework that is in many ways related to Hough transform, sharing its simplicity and wide applicability. At the same time, the framework bypasses the problem of multiple peaks identification in Hough images, and permits detection of multiple objects without invoking nonmaximum suppression heuristics. As a result, the experiments demonstrate a significant improvement in detection accuracy both for the classical task of straight line detection and for a more modern category-level (pedestrian) detection problem. 1. Hough Transform in Object Detection The Hough transform [10] is one of the classical computer vision techniques which dates 50 years back. It was initially suggested as a method for line detection in edge maps of images but was then extended to detect general low-parametric objects such as circles [2]. In recent years, Hough-based methods were successful adapted to the problem of part-based category-level object detection where they have obtained state-of-the-art results for some popular datasets [12, 13, 7, 8, 15, 3]. Both the classical Hough transform and its more modern variants proceed by converting the input image into a new representation called the Hough image which lives in a domain called the Hough space (Figure 1). Each point in the Hough space corresponds to a hypothesis about the object of interest being present in the original image at a particular location and configuration. Any Hough transform based method essentially works by splitting the input image into a set of voting elements. Each such element votes for the hypotheses that might have generated this element. For instance, a feature that fires ∗ The first two authors were with Microsoft Research through the initial stages of the work and are currently supported by Microsoft Research programs in Russia. Victor Lempitsky is also supported by EU under ERC
A Hough transform-based voting framework for action recognition
- IN: CVPR
, 2010
"... We present a method to classify and localize human actions in video using a Hough transform voting framework. Random trees are trained to learn a mapping between densely-sampled feature patches and their corresponding votes in a spatio-temporal-action Hough space. The leaves of the trees form a disc ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
We present a method to classify and localize human actions in video using a Hough transform voting framework. Random trees are trained to learn a mapping between densely-sampled feature patches and their corresponding votes in a spatio-temporal-action Hough space. The leaves of the trees form a discriminative multi-class codebook that share features between the action classes and vote for action centers in a probabilistic manner. Using low-level features such as gradients and optical flow, we demonstrate that Hough-voting can achieve state-of-the-art performance on several datasets covering a wide range of action-recognition scenarios.
The chains model for detecting parts by their context
"... Detecting an object part relies on two sources of information- the appearance of the part itself, and the context supplied by surrounding parts. In this paper we consider problems in which a target part cannot be recognized reliably using its own appearance, such as detecting lowresolution hands, an ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Detecting an object part relies on two sources of information- the appearance of the part itself, and the context supplied by surrounding parts. In this paper we consider problems in which a target part cannot be recognized reliably using its own appearance, such as detecting lowresolution hands, and must be recognized using the context of surrounding parts. We develop the ‘chains model’ which can locate parts of interest in a robust and precise manner, even when the surrounding context is highly variable and deformable. In the proposed model, the relation between context features and the target part is modeled in a non-parametric manner using an ensemble of feature chains leading from parts in the context to the detection target. The method uses the configuration of the features in the image directly rather than through fitting an articulated 3-D model of the object. In addition, the chains are composable, meaning that new chains observed in the test image can be composed of sub-chains seen during training. Consequently, the model is capable of handling object poses which are infrequent, even non-existent, during training. We test the approach in different settings, including object parts detection, as well as complete object detection. The results show the advantages of the chains model for detecting and localizing parts of complex deformable objects. 1.
Efficient Regression of General-Activity Human Poses from Depth Images
"... We present a new approach to general-activity human pose estimation from depth images, building on Hough forests. We extend existing techniques in several ways: real time prediction of multiple 3D joints, explicit learning of voting weights, vote compression to allow larger training sets, and a comp ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We present a new approach to general-activity human pose estimation from depth images, building on Hough forests. We extend existing techniques in several ways: real time prediction of multiple 3D joints, explicit learning of voting weights, vote compression to allow larger training sets, and a comparison of several decision-tree training objectives. Key aspects of our work include: regression directly from the raw depth image, without the use of an arbitrary intermediate representation; applicability to general motions (not constrained to particular activities) and the ability to localize occluded as well as visible body joints. Experimental results demonstrate that our method produces state of the art results on several data sets including the challenging MSRC-5000 pose estimation test set, at a speed of about 200 frames per second. Results on silhouettes suggest broader applicability to other imaging modalities. 1.
(RF) 2 — Random Forest Random Field
"... We combine random forest (RF) and conditional random field (CRF) into a new computational framework, called random forest random field (RF) 2. Inference of (RF) 2 uses the Swendsen-Wang cut algorithm, characterized by Metropolis-Hastings jumps. A jump from one state to another depends on the ratio o ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We combine random forest (RF) and conditional random field (CRF) into a new computational framework, called random forest random field (RF) 2. Inference of (RF) 2 uses the Swendsen-Wang cut algorithm, characterized by Metropolis-Hastings jumps. A jump from one state to another depends on the ratio of the proposal distributions, and on the ratio of the posterior distributions of the two states. Prior work typically resorts to a parametric estimation of these four distributions, and then computes their ratio. Our key idea is to instead directly estimate these ratios using RF. RF collects in leaf nodes of each decision tree the class histograms of training examples. We use these class histograms for a nonparametric estimation of the distribution ratios. We derive the theoretical error bounds of a two-class (RF) 2. (RF) 2 is applied to a challenging task of multiclass object recognition and segmentation over a random field of input image regions. In our empirical evaluation, we use only the visual information provided by image regions (e.g., color, texture, spatial layout), whereas the competing methods additionally use higher-level cues about the horizon location and 3D layout of surfaces in the scene. Nevertheless, (RF) 2 outperforms the state of the art on benchmark datasets, in terms of accuracy and computation time. 1
Variations of a Hough-Voting Action Recognition System
"... Abstract. This paper presents two variations of a Hough-voting framework used for action recognition and shows classification results for lowresolution video and videos depicting human interactions. For low-resolution videos, where people performing actions are around 30 pixels, we adopt low-level f ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. This paper presents two variations of a Hough-voting framework used for action recognition and shows classification results for lowresolution video and videos depicting human interactions. For low-resolution videos, where people performing actions are around 30 pixels, we adopt low-level features such as gradients and optical flow. For group actions with human-human interactions, we take the probabilistic action labels from the Hough-voting framework for single individuals and combine them into group actions using decision profiles and classifier combination. Keywords: human action recognition, Hough-voting, video analysis, low-resolution video, group action recognition, activity recognition 1
Tracking People in Broadcast Sports
"... Abstract. We present a method for tracking people in monocular broadcast sports videos by coupling a particle filter with a vote-based confidence map of athletes, appearance features and optical flow for motion estimation. The confidence map provides a continuous estimate of possible target location ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. We present a method for tracking people in monocular broadcast sports videos by coupling a particle filter with a vote-based confidence map of athletes, appearance features and optical flow for motion estimation. The confidence map provides a continuous estimate of possible target locations in each frame and outperforms tracking with discrete target detections. We demonstrate the tracker on sports videos, tracking fast and articulated movements of athletes such as divers and gymnasts and on non-sports videos, tracking pedestrians in a PETS2009 sequence. 1
Backprojection Revisited: Scalable Multi-view Object Detection and Similarity Metrics for Detections
"... Abstract. Hough transform based object detectors learn a mapping from the image domain to a Hough voting space. Within this space, object hypotheses are formed by local maxima. The votes contributing to a hypothesis are called support. In this work, we investigate the use of the support and its back ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. Hough transform based object detectors learn a mapping from the image domain to a Hough voting space. Within this space, object hypotheses are formed by local maxima. The votes contributing to a hypothesis are called support. In this work, we investigate the use of the support and its backprojection to the image domain for multi-view object detection. To this end, we create a shared codebook with training and matching complexities independent of the number of quantized views. We show that since backprojection encodes enough information about the viewpoint all views can be handled together. In our experiments, we demonstrate that superior accuracy and efficiency can be achieved in comparison to the popular one-vs-the-rest detectors by treating views jointly especially with few training examples and no view annotations. Furthermore, we go beyond the detection case and based on the support we introduce a part-based similarity measure between two arbitrary detections which naturally takes spatial relationships of parts into account and is insensitive to partial occlusions. We also show that backprojection can be used to efficiently measure the similarity of a detection to all training examples. Finally, we demonstrate how these metrics can be used to estimate continuous object parameters like human pose and object’s viewpoint. In our experiment, we achieve state-of-the-art performance for view-classification on the PASCAL VOC’06 dataset. 1
Semantic Classification by Covariance Descriptors Within a Randomized Forest
"... This paper investigates an approach to perform semantic classification in aerial imagery by compactly integrating multiple feature cues, like appearance and 3D height information. We therefore propose a novel technique to incorporate powerful covariance region descriptors into the decision nodes of ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper investigates an approach to perform semantic classification in aerial imagery by compactly integrating multiple feature cues, like appearance and 3D height information. We therefore propose a novel technique to incorporate powerful covariance region descriptors into the decision nodes of a randomized forest framework efficiently. The concept of finding reliable binary splits is based on repeated random sampling of distributions that are specified by mean vectors and covariance matrices. The sampling strategy is related to Monte Carlo simulations and perfectly fits the learning strategy of randomized decision trees, while the covariance descriptors are exploited to perform a plausible feature cue integration. To show state-of-the-art performance, we first evaluate our proposed approach on the MSRC dataset including 21 object classes. Then, we illustrate how an additional integration of 3D information improves the classification accuracy in real world aerial images taken from Dallas, San Francisco, and Graz. In addition, we use the available camera data and 3D information to combine the overlapping per-image classifications into a large-scale semantic description map that is directly applicable to virtual or procedural 3D modeling of urban environments. 1.
Abstract
"... To detect multiple objects of interest, the methods based on Hough transform use non-maxima supression or mode seeking in order to locate and to distinguish peaks in Hough images. Such postprocessing requires tuning of extra parameters and is often fragile, especially when objects of interest tend t ..."
Abstract
- Add to MetaCart
To detect multiple objects of interest, the methods based on Hough transform use non-maxima supression or mode seeking in order to locate and to distinguish peaks in Hough images. Such postprocessing requires tuning of extra parameters and is often fragile, especially when objects of interest tend to be closely located. In the paper, we develop a new probabilistic framework that is in many ways related to Hough transform, sharing its simplicity and wide applicability. At the same time, the framework bypasses the problem of multiple peaks identification in Hough images, and permits detection of multiple objects without invoking nonmaximum suppression heuristics. As a result, the experiments demonstrate a significant improvement in detection accuracy both for the classical task of straight line detection and for a more modern category-level (pedestrian) detection problem. 1. Hough Transform in Object Detection The Hough transform [12] is one of the classical computer vision techniques which dates 50 years back. It was initially suggested as a method for line detection in edge maps of images but was then extended to detect general low-parametric objects such as circles [3]. In recent years, Hough-based methods were successful adapted to the problem of part-based category-level object detection where they have obtained state-of-the-art results for some popular

