• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Pictorial structures revisited: People detection and articulated pose estimation (0)

by M Andriluka, S Roth, B Schiele
Venue:In CVPR’09
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 211
Next 10 →

Computer Vision: Algorithms and Applications

by Richard Szeliski , 2010
"... ..."
Abstract - Cited by 252 (2 self) - Add to MetaCart
Abstract not found

L.: Modeling mutual context of object and human pose in human-object interaction activities

by Bangpeng Yao, Li Fei-fei , 2010
"... Detecting objects in cluttered scenes and estimating articulated human body parts are two challenging problems in computer vision. The difficulty is particularly pronounced in activities involving human-object interactions (e.g. playing tennis), where the relevant object tends to be small or only pa ..."
Abstract - Cited by 155 (5 self) - Add to MetaCart
Detecting objects in cluttered scenes and estimating articulated human body parts are two challenging problems in computer vision. The difficulty is particularly pronounced in activities involving human-object interactions (e.g. playing tennis), where the relevant object tends to be small or only partially visible, and the human body parts are often self-occluded. We observe, however, that objects and human poses can serve as mutual context to each other – recognizing one facilitates the recognition of the other. In this paper we propose a new random field model to encode the mutual context of objects and human poses in human-object interaction activities. We then cast the model learning task as a structure learning problem, of which the structural connectivity between the object, the overall human pose, and different body parts are estimated through a structure search approach, and the parameters of the model are estimated by a new max-margin algorithm. On a sports data set of six classes of human-object interactions [12], we show that our mutual context model significantly outperforms state-of-theart in detecting very difficult objects and human poses. 1.
(Show Context)

Citation Context

...d work The two central tasks, human pose estimation and object detection, have been studied in computer vision for many years. Most of the pose estimation work uses a tree structure of the human body =-=[10, 26, 1]-=- which allows fast inference. In order to capture more complex body articulations, some non-tree models have also been proposed [27, 31]. Although those methods have been demonstrated to work well on ...

Detecting People Using Mutually Consistent Poselet Activations ⋆

by Lubomir Bourdev, Subhransu Maji, Thomas Brox, Jitendra Malik
"... Abstract. Bourdev and Malik (ICCV 09) introduced a new notion of parts, poselets, constructed to be tightly clustered both in the configuration space of keypoints, as well as in the appearance space of image patches. In this paper we develop a new algorithm for detecting people using poselets. Unlik ..."
Abstract - Cited by 142 (28 self) - Add to MetaCart
Abstract. Bourdev and Malik (ICCV 09) introduced a new notion of parts, poselets, constructed to be tightly clustered both in the configuration space of keypoints, as well as in the appearance space of image patches. In this paper we develop a new algorithm for detecting people using poselets. Unlike that work which used 3D annotations of keypoints, we use only 2D annotations which are much easier for naive human annotators. The main algorithmic contribution is in how we use the pattern of poselet activations. Individual poselet activations are noisy, but considering the spatial context of each can provide vital disambiguating information, just as object detection can be improved by considering the detection scores of nearby objects in the scene. This can be done by training a two-layer feed-forward network with weights set using a max margin technique. The refined poselet activations are then clustered into mutually consistent hypotheses where consistency is based on empirically determined spatial keypoint distributions. Finally, bounding boxes are predicted for each person hypothesis and shape masks are aligned to edges in the image to provide a segmentation. To the best of our knowledge, the resulting system is the current best performer on the task of people detection and segmentation with an average precision of 47.8% and 40.5 % respectively on PASCAL VOC 2009. 1
(Show Context)

Citation Context

...the silhouette g : R2 → {0, 1} of the predicted binary mask. We then estimate the deformation field (u, v) : R2 → R that minimizes ∫ E(u, v) = |f(x, y) − g(x + u, y + v)| + α (|∇u| 2 + |∇v| 2 ) dxdy. =-=(5)-=- R 2 The parameter α = 50 determines the amount of flexibility granted to the deformation field. We use a coarse-to-fine numerical scheme known from optical flow estimation to compute the minimizer of...

Monocular 3D Pose Estimation and Tracking by Detection

by Mykhaylo Andriluka, Stefan Roth Bernt Schiele
"... Automatic recovery of 3D human pose from monocular image sequences is a challenging and important research topic with numerous applications. Although current methods are able to recover 3D pose for a single person in controlled environments, they are severely challenged by realworld scenarios, such ..."
Abstract - Cited by 101 (13 self) - Add to MetaCart
Automatic recovery of 3D human pose from monocular image sequences is a challenging and important research topic with numerous applications. Although current methods are able to recover 3D pose for a single person in controlled environments, they are severely challenged by realworld scenarios, such as crowded street scenes. To address this problem, we propose a three-stage process building on a number of recent advances. The first stage obtains an initial estimate of the 2D articulation and viewpoint of the person from single frames. The second stage allows early data association across frames based on tracking-by-detection. These two stages successfully accumulate the available 2D image evidence into robust estimates of 2D limb positions over short image sequences ( = tracklets). The third and final stage uses those tracklet-based estimates as robust image observations to reliably recover 3D pose. We demonstrate state-of-the-art performance on the HumanEva II benchmark, and also show the applicability of our approach to articulated 3D tracking in realistic street conditions. 1.
(Show Context)

Citation Context

... To estimate these reliably from single frames, the first stage (Sec. 2) builds on a recently proposed part-based people detection and pose estimation framework based on discriminative part detectors =-=[3]-=-. To accumulate further 2D image evidence, the second stage (Sec. 3) extracts people tracklets from a small number of consecutive frames using a 2D-tracking-by-detection approach. Here, the output of ...

Hough Forests for Object Detection, Tracking, and Action Recognition

by Juergen Gall , Angela Yao, Nima Razavi, Luc Van Gool, Victor Lempitsky
"... The paper introduces Hough forests which are random forests adapted to perform a generalized Hough transform in an efficient way. Compared to previous Hough-based systems such as implicit shape models, Hough forests improve the performance of the generalized Hough transform for object detection on a ..."
Abstract - Cited by 97 (23 self) - Add to MetaCart
The paper introduces Hough forests which are random forests adapted to perform a generalized Hough transform in an efficient way. Compared to previous Hough-based systems such as implicit shape models, Hough forests improve the performance of the generalized Hough transform for object detection on a categorical level. At the same time, their flexibility permits extensions of the Hough transform to new domains such as object tracking and action recognition. Hough forests can be regarded as task-adapted codebooks of local appearance that allow fast supervised training and fast matching at test time. They achieve high detection accuracy since the entries of such codebooks are optimized to cast Hough votes with small variance, and since their efficiency permits dense sampling of local image patches or video cuboids during detection. The efficacy of Hough forests for a set of computer vision tasks is validated through experiments on a large set of publicly available benchmark datasets and comparisons with the state-of-the-art.

Better appearance models for pictorial structures

by Marcin Eichner, Vittorio Ferrari , 2008
"... We present a novel approach for estimating body part appearance models for pictorial structures. We learn latent relationships between the appearance of different body parts from annotated images, which then help in estimating better appearance models on novel images. The learned appearance models a ..."
Abstract - Cited by 89 (11 self) - Add to MetaCart
We present a novel approach for estimating body part appearance models for pictorial structures. We learn latent relationships between the appearance of different body parts from annotated images, which then help in estimating better appearance models on novel images. The learned appearance models are general, in that they can be plugged into any pictorial structure engine. In a comprehensive evaluation we demonstrate the benefits brought by the new appearance models to an existing articulated human pose estimation algorithm, on hundreds of highly challenging images from the TV series Buffy the vampire slayer and the PASCAL VOC 2008 challenge.

Cascaded Models for Articulated Pose Estimation

by Benjamin Sapp, Alexander Toshev, Ben Taskar - ECCV 2010 , 2010
"... Abstract. We address the problem of articulated human pose estimation by learning a coarse-to-fine cascade of pictorial structure models. While the fine-level state-space of poses of individual parts is too large to permit the use of rich appearance models, most possibilities can be ruled out by eff ..."
Abstract - Cited by 72 (4 self) - Add to MetaCart
Abstract. We address the problem of articulated human pose estimation by learning a coarse-to-fine cascade of pictorial structure models. While the fine-level state-space of poses of individual parts is too large to permit the use of rich appearance models, most possibilities can be ruled out by efficient structured models at a coarser scale. We propose to learn a sequence of structured models at different pose resolutions, where coarse models filter the pose space for the next level via their max-marginals. The cascade is trained to prune as much as possible while preserving true poses for the final level pictorial structure model. The final level uses much more expensive segmentation, contour and shape features in the model for the remaining filtered set of candidates. We evaluate our framework on the challenging Buffy and PASCAL human pose datasets, improving the state-of-the-art.

New features and insights for pedestrian detection

by Stefan Walk, Nikodem Majer, Konrad Schindler, Bernt Schiele - In CVPR , 2010
"... Despite impressive progress in people detection the performance on challenging datasets like Caltech Pedestrians or TUD-Brussels is still unsatisfactory. In this work we show that motion features derived from optic flow yield substantial improvements on image sequences, if implemented correctly—even ..."
Abstract - Cited by 68 (5 self) - Add to MetaCart
Despite impressive progress in people detection the performance on challenging datasets like Caltech Pedestrians or TUD-Brussels is still unsatisfactory. In this work we show that motion features derived from optic flow yield substantial improvements on image sequences, if implemented correctly—even in the case of low-quality video and consequently degraded flow fields. Furthermore, we introduce a new feature, self-similarity on color channels, which consistently improves detection performance both for static images and for video sequences, across different datasets. In combination with HOG, these two features outperform the state-of-the-art by up to 20%. Finally, we report two insights concerning detector evaluations, which apply to classifier-based object detection in general. First, we show that a commonly under-estimated detail of training, the number of bootstrapping rounds, has a drastic influence on the relative (and absolute) performance of different feature/classifier combinations. Second, we discuss important intricacies of detector evaluation and show that current benchmarking protocols lack crucial details, which can distort evaluations. 1.
(Show Context)

Citation Context

...er. An important insight of past research is that powerful articulated models, which can adapt to variations in body pose, only help in the presence of strong pose variations, such as in sport scenes =-=[1]-=-. On the contrary, the most sucessful model to date for “normal” pedestrians, who are usually standing or walking upright, is still a monolithic global descriptor for the entire search window. With su...

Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation

by Sam Johnson, Mark Everingham , 2010
"... We investigate the task of 2D articulated human pose estimation in unconstrained still images. This is extremely challenging because of variation in pose, anatomy, clothing, and imaging conditions. Current methods use simple models of body part appearance and plausible configurations due to limitati ..."
Abstract - Cited by 65 (1 self) - Add to MetaCart
We investigate the task of 2D articulated human pose estimation in unconstrained still images. This is extremely challenging because of variation in pose, anatomy, clothing, and imaging conditions. Current methods use simple models of body part appearance and plausible configurations due to limitations of available training data and constraints on computational expense. We show that such models severely limit accuracy. Building on the successful pictorial structure model (PSM) we propose richer models of both appearance and pose, using state-of-the-art discriminative classifiers without introducing unacceptable computational expense. We introduce a new annotated database of challenging consumer images, an order of magnitude larger than currently available datasets, and demonstrate over 50 % relative improvement in pose estimation accuracy over a stateof-the-art method.

Learning hierarchical poselets for human parsing

by Yang Wang, Duan Tran, Zicheng Liao - In CVPR’11
"... We consider the problem of human parsing with part-based models. Most previous work in part-based models only considers rigid parts (e.g. torso, head, half limbs) guided by human anatomy. We argue that this represen-tation of parts is not necessarily appropriate for human parsing. In this paper, we ..."
Abstract - Cited by 52 (2 self) - Add to MetaCart
We consider the problem of human parsing with part-based models. Most previous work in part-based models only considers rigid parts (e.g. torso, head, half limbs) guided by human anatomy. We argue that this represen-tation of parts is not necessarily appropriate for human parsing. In this paper, we introduce hierarchical poselets – a new representation for human parsing. Hierarchical poselets can be rigid parts, but they can also be parts that cover large portions of human bodies (e.g. torso + left arm). In the extreme case, they can be the whole bod-ies. We develop a structured model to organize poselets in a hierarchical way and learn the model parameters in a max-margin framework. We demonstrate the superior per-formance of our proposed approach on two datasets with aggressive pose variations. 1.
(Show Context)

Citation Context

...ative locations of part locations with respect to a person detection, the relationship between different part appearances (e.g. upper-arm and torso tend to have the same color), etc. Andriluka et al. =-=[1]-=- build better edge-based appearance models using shape contexts. Sapp et al. [20] develop efficient inference algorithm to allow the use of more expensive features. There is also work [15, 13, 23] on ...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University