• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Articulated pose estimation with flexible mixtures-of-parts (0)

by Y Yang, D Ramanan
Venue:In CVPR’11
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 179
Next 10 →

1 Articulated Human Detection with Flexible Mixtures-of-Parts

by Yi Yang, Deva Ramanan
"... Abstract—We describe a method for articulated human detection and human pose estimation in static images based on a new representation of deformable part models. Rather than modeling articulation using a family of warped (rotated and foreshortened) templates, we use a mixture of small, non-oriented ..."
Abstract - Cited by 65 (2 self) - Add to MetaCart
Abstract—We describe a method for articulated human detection and human pose estimation in static images based on a new representation of deformable part models. Rather than modeling articulation using a family of warped (rotated and foreshortened) templates, we use a mixture of small, non-oriented parts. We describe a general, flexible mixture model that jointly captures spatial relations between part locations and co-occurrence relations between part mixtures, augmenting standard pictorial structure models that encode just spatial relations. Our models have several notable properties: (1) they efficiently model articulation by sharing computation across similar warps (2) they efficiently model an exponentially-large set of global mixtures through composition of local mixtures and (3) they capture the dependency of global geometry on local appearance (parts look different at different locations). When relations are tree-structured, our models can be efficiently optimized with dynamic programming. We learn all parameters, including local appearances, spatial relations, and co-occurrence relations (which encode local rigidity) with a structured SVM solver. Because our model is efficient enough to be used as a detector that searches over scales and image locations, we introduce novel criteria for evaluating pose estimation and human detection, both separately and jointly. We show that currently-used evaluation criteria may conflate these two issues. Most previous approaches model limbs with rigid and articulated templates that are trained independently of each other, while we present an extensive diagnostic evaluation that suggests that flexible structure and joint training are crucial for strong performance. We present experimental results on standard benchmarks that suggest our approach is the state-of-the-art system for pose estimation, improving past work on the challenging Parse and Buffy datasets, while being orders of magnitude faster.
(Show Context)

Citation Context

...verall detection rate, but as we argue, this unfairly favors methods that report back many candidate poses (because false positives are not penalized). Indeed, the original performance we reported in =-=[10]-=- appears to be inflated due to this effect. Rather, we evaluate the full test videos using our new criteria for PCK and APK. Our PCK score outperforms our PCP score, likely due to foreshortened arms i...

Discovering localized attributes for fine-grained recognition

by Kun Duan, Devi Parikh, David Crandall, Kristen Grauman - In CVPR. IEEE , 2012
"... red stripes on wings orange stripes on wings Attributes are visual concepts that can be detected by machines, understood by humans, and shared across categories. They are particularly useful for fine-grained domains where categories are closely related to one other (e.g. bird species recognition). I ..."
Abstract - Cited by 52 (1 self) - Add to MetaCart
red stripes on wings orange stripes on wings Attributes are visual concepts that can be detected by machines, understood by humans, and shared across categories. They are particularly useful for fine-grained domains where categories are closely related to one other (e.g. bird species recognition). In such scenarios, relevant attributes are often local (e.g. “white belly”), but the question of how to choose these local attributes remains largely unexplored. In this paper, we propose an interactive approach that discovers local attributes that are both discriminative and semantically meaningful from image datasets annotated only with fine-grained category labels and object bounding boxes. Our approach uses a latent conditional random field model to discover candidate attributes that are detectable and discriminative, and then employs a recommender system that selects attributes likely to be semantically meaningful. Human interaction is used to provide semantic names for the discovered attributes. We demonstrate our method on two challenging datasets, Caltech-UCSD Birds-200-2011 and Leeds Butterflies, and find that our discovered attributes outperform those generated by traditional approaches. 1.
(Show Context)

Citation Context

...e attribute labels; we discover these attributes automatically. Related to our work on local attribute selection is the extensive literature on learning part-based object models for recognition (e.g. =-=[7, 8, 26]-=-). These learning techniques usually look for highly distinctive parts – regions that are common within an object category but rare outside of it – and they make no attempt to ensure that the parts of...

Object detection using stronglysupervised deformable part models

by Hossein Azizpour, Ivan Laptev - In ECCV , 2012
"... Abstract. Deformable part-based models [1, 2] achieve state-of-the-art performance for object detection, but rely on heuristic initialization dur-ing training due to the optimization of non-convex cost function. This paper investigates limitations of such an initialization and extends earlier method ..."
Abstract - Cited by 42 (3 self) - Add to MetaCart
Abstract. Deformable part-based models [1, 2] achieve state-of-the-art performance for object detection, but rely on heuristic initialization dur-ing training due to the optimization of non-convex cost function. This paper investigates limitations of such an initialization and extends earlier methods using additional supervision. We explore strong supervision in terms of annotated object parts and use it to (i) improve model initial-ization, (ii) optimize model structure, and (iii) handle partial occlusions. Our method is able to deal with sub-optimal and incomplete annotations of object parts and is shown to benefit from semi-supervised learning se-tups where part-level annotation is provided for a fraction of positive examples only. Experimental results are reported for the detection of six animal classes in PASCAL VOC 2007 and 2010 datasets. We demon-strate significant improvements in detection performance compared to the LSVM [1] and the Poselet [3] object detectors. 1
(Show Context)

Citation Context

... of human body parts from manually annotated limb locations. The joint learning of appearance and deformation parameters in DPMs using part-level supervision has been used for body pose estimation in =-=[10]-=- and object part localization in [16]. The goal of this work is to develop and evaluate a strongly-supervised DPM framework for object detection. Our extensions of existing methods are motivated by th...

A database for fine grained activity detection of cooking activities

by Marcus Rohrbach, Sikandar Amin, Mykhaylo Andriluka, Bernt Schiele - In CVPR , 2012
"... While activity recognition is a current focus of research the challenging problem of fine-grained activity recognition is largely overlooked. We thus propose a novel database of 65 cooking activities, continuously recorded in a realistic setting. Activities are distinguished by fine-grained body mot ..."
Abstract - Cited by 40 (5 self) - Add to MetaCart
While activity recognition is a current focus of research the challenging problem of fine-grained activity recognition is largely overlooked. We thus propose a novel database of 65 cooking activities, continuously recorded in a realistic setting. Activities are distinguished by fine-grained body motions that have low inter-class variability and high intra-class variability due to diverse subjects and ingredients. We benchmark two approaches on our dataset, one based on articulated pose tracks and the second using holistic video features. While the holistic approach outperforms the pose-based approach, our evaluation suggests that fine-grained activities are more difficult to detect and the body model can help in those cases. Providing high-resolution videos as well as an intermediate pose representation we hope to foster research in fine-grained activity recognition. 1.
(Show Context)

Citation Context

...t to 75) and as video streams (compressed weakly with mpeg4v2 at a bitrate of 2500). 3 upper arm lower arm Method Torso Head r l r l All Original models CPS [29] 67.1 0.0 53.4 48.6 47.3 37.0 42.2 FMP =-=[41]-=- 63.9 72.1 60.2 59.6 42.1 46.7 57.4 PS [3] 58.0 45.5 50.5 57.2 43.3 38.8 48.9 Trained on our data FMP [41] 79.6 67.7 60.7 60.8 50.1 50.3 61.5 PS [3] 80.1 80.0 67.8 69.6 48.9 49.6 66.0 FPS (our model) ...

Detecting actions, poses, and objects with relational phraselets. ECCV

by Chaitanya Desai, Deva Ramanan , 2012
"... Abstract. We present a novel approach to modeling human pose, together with interacting objects, based on compositional models of local visual interactions and their relations. Skeleton models, while flexible enough to capture large articulations, fail to accurately model selfocclusions and interact ..."
Abstract - Cited by 35 (3 self) - Add to MetaCart
Abstract. We present a novel approach to modeling human pose, together with interacting objects, based on compositional models of local visual interactions and their relations. Skeleton models, while flexible enough to capture large articulations, fail to accurately model selfocclusions and interactions. Poselets and Visual Phrases address this limitation, but do so at the expense of requiring a large set of templates. We combine all three approaches with a compositional model that is flexible enough to model detailed articulations but still captures occlusions and object interactions. Unlike much previous work on action classification, we do not assume test images are labeled with a person, and instead present results for “action detection ” in an unlabeled image. Notably, for each detection, our model reports back a detailed description including an action label, articulated human pose, object poses, and occlusion flags. We demonstrate that modeling occlusion is crucial for recognizing human-object interactions. We present results on the PASCAL Action
(Show Context)

Citation Context

...ularized through 2D pictorial structure models that allow for efficient inference given tree-structured spatial relations [1]. We specifically follow the flexible mixtures of parts (FMP) framework of =-=[2]-=-, which augments a standard pictorial structure with local part mixtures. While such methods are flexible enough to capture large variations in appearance due to pose, they still fail to accurately ca...

Joint deep learning for pedestrian detection

by Wanli Ouyang, Xiaogang Wang - In ICCV , 2013
"... Feature extraction, deformation handling, occlusion handling, and classification are four important components in pedestrian detection. Existing methods learn or design these components either individually or sequentially. The interaction among these components is not yet well ex-plored. This paper ..."
Abstract - Cited by 34 (11 self) - Add to MetaCart
Feature extraction, deformation handling, occlusion handling, and classification are four important components in pedestrian detection. Existing methods learn or design these components either individually or sequentially. The interaction among these components is not yet well ex-plored. This paper proposes that they should be jointly learned in order to maximize their strengths through coop-eration. We formulate these four components into a joint deep learning framework and propose a new deep network architecture1. By establishing automatic, mutual interac-tion among components, the deep model achieves a 9 % re-duction in the average miss rate compared with the cur-rent best-performing pedestrian detection approaches on the largest Caltech benchmark dataset. 1.
(Show Context)

Citation Context

...ling translational movement of parts. To handle more complex articulations, size change and rotation of parts are modeled in [18], and mixture of part appearance and articulation types are modeled in =-=[4, 55, 6]-=-. In these approaches, features are manually designed. In order to handle occlusion, many approaches have been proposed for estimating the visibility of parts [13, 51, 54, 53, 45, 27]. Some of them us...

Parsing Clothing in Fashion Photographs

by Kota Yamaguchi, M. Hadi Kiapour , Luis E. Ortiz, Tamara L. Berg
"... In this paper we demonstrate an effective method for parsing clothing in fashion photographs, an extremely challenging problem due to the large number of possible garment items, variations in configuration, garment appearance, layering, and occlusion. In addition, we provide a large novel dataset an ..."
Abstract - Cited by 33 (3 self) - Add to MetaCart
In this paper we demonstrate an effective method for parsing clothing in fashion photographs, an extremely challenging problem due to the large number of possible garment items, variations in configuration, garment appearance, layering, and occlusion. In addition, we provide a large novel dataset and tools for labeling garment items, to enable future research on clothing estimation. Finally, we present intriguing initial results on using clothing estimates to improve pose identification, and demonstrate a prototype application for pose-independent visual garment retrieval.

Human Pose Estimation using Body Parts Dependent Joint Regressors

by Matthias Dantone, Juergen Gall, Christian Leistner, Luc Van Gool
"... In this work, we address the problem of estimating 2d human pose from still images. Recent methods that rely on discriminatively trained deformable parts organized in a tree model have shown to be very successful in solving this task. Within such a pictorial structure framework, we address the probl ..."
Abstract - Cited by 31 (6 self) - Add to MetaCart
In this work, we address the problem of estimating 2d human pose from still images. Recent methods that rely on discriminatively trained deformable parts organized in a tree model have shown to be very successful in solving this task. Within such a pictorial structure framework, we address the problem of obtaining good part templates by proposing novel, non-linear joint regressors. In particular, we employ two-layered random forests as joint regressors. The first layer acts as a discriminative, independent body part classifier. The second layer takes the estimated class distributions of the first one into account and is thereby able to predict joint locations by modeling the interdependence and co-occurrence of the parts. This results in a pose estimation framework that takes dependencies between body parts already for joint localization into account and is thus able to circumvent typical ambiguities of tree structures, such as for legs and arms. In the experiments, we demonstrate that our body parts dependent joint regressors achieve a higher joint localization accuracy than tree-based state-of-the-art methods. 1.
(Show Context)

Citation Context

...s [42] of the body parts. In object detection, one of the best performing methods relies on so called deformable part models [10], which use mixtures of star models over templates of parts. Recently, =-=[40]-=- showed that mixtures of part templates can also be efficiently used in a tree model, leading to very powerful pose estimation models. In particular, instead of modeling the transformations of a singl...

C.: Joint training of a convolutional network and a graphical model for human pose estimation

by Jonathan Tompson, Arjun Jain, Yann Lecun, Christoph Bregler , 2014
"... This paper proposes a new hybrid architecture that consists of a deep Convolu-tional Network and a Markov Random Field. We show how this architecture is successfully applied to the challenging problem of articulated human pose esti-mation in monocular images. The architecture can exploit structural ..."
Abstract - Cited by 31 (2 self) - Add to MetaCart
This paper proposes a new hybrid architecture that consists of a deep Convolu-tional Network and a Markov Random Field. We show how this architecture is successfully applied to the challenging problem of articulated human pose esti-mation in monocular images. The architecture can exploit structural domain con-straints such as geometric relationships between body joint locations. We show that joint training of these two model paradigms improves performance and allows us to significantly outperform existing state-of-the-art techniques. 1
(Show Context)

Citation Context

...rial Structures” such as Felzenszwalb and colleagues work [12] made this approach tractable with so called ‘Deformable Part Models (DPM)’. Subsequently a large number of related models were developed =-=[2, 11, 37, 10]-=-. Algorithms which model more complex joint relationships, such as Yang and Ramanan [37], use a flexible mixture of templates modeled by linear SVMs. Johnson and Everingham [17] employ a cascade of bo...

Histograms of Sparse Codes for Object Detection

by Xiaofeng Ren, Deva Ramanan
"... Object detection has seen huge progress in recent years, much thanks to the heavily-engineered Histograms of Oriented Gradients (HOG) features. Can we go beyond gradients and do better than HOG? We provide an affirmative answer by proposing and investigating a sparse representation for object detect ..."
Abstract - Cited by 28 (2 self) - Add to MetaCart
Object detection has seen huge progress in recent years, much thanks to the heavily-engineered Histograms of Oriented Gradients (HOG) features. Can we go beyond gradients and do better than HOG? We provide an affirmative answer by proposing and investigating a sparse representation for object detection, Histograms of Sparse Codes (HSC). We compute sparse codes with dictionaries learned from data using K-SVD, and aggregate per-pixel sparse codes to form local histograms. We intentionally keep true to the sliding window framework (with mixtures and parts) and only change the underlying features. To keep training (and testing) efficient, we apply dimension reduction by computing SVD on learned models, and adopt supervised training where latent positions of roots and parts are given externally e.g. from a HOG-based detector. By learning and using local representations that are much more expressive than gradients, we demonstrate large improvements over the state of the art on the PASCAL benchmark for both rootonly and part-based models. 1.
(Show Context)

Citation Context

... popular Deformable Parts Model (DPM) [13], the Exemplar-SVM model [21], and pretty much every other modern object detector. HOG is also seeing increasing use in other domains such as pose estimation =-=[34]-=-, face recognition [35], and scene classification [32]. The HOG features, heavily engineered for both accuracy and speed, are not without issues or limits. They are gradient-based and lack the ability...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University