• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations

by Xianjie Chen, Alan Yuille
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 13
Next 10 →

Efficient Object Localization Using Convolutional Networks. arXiv preprint

by Jonathan Tompson, Ross Goroshin, Arjun Jain, Yann Lecun, Christopher Bregler , 2015
"... Recent state-of-the-art performance on human-body pose estimation has been achieved with Deep Convolu-tional Networks (ConvNets). Traditional ConvNet architec-tures include pooling layers which reduce computational re-quirements, introduce invariance and prevent over-training. These benefits of pool ..."
Abstract - Cited by 6 (0 self) - Add to MetaCart
Recent state-of-the-art performance on human-body pose estimation has been achieved with Deep Convolu-tional Networks (ConvNets). Traditional ConvNet architec-tures include pooling layers which reduce computational re-quirements, introduce invariance and prevent over-training. These benefits of pooling come at the cost of reduced lo-calization accuracy. We introduce a novel architecture which includes an efficient ‘position refinement ’ model that is trained to estimate the joint offset location within a small region of the image. This refinement model is jointly trained in cascade with a state-of-the-art ConvNet model [21] to achieve improved accuracy in human joint loca-tion estimation. We show that the variance of our detec-tor approaches the variance of human annotations on the FLIC [20] dataset and outperforms all existing approaches on the MPII-human-pose dataset [1]. 1.
(Show Context)

Citation Context

... of human-body part localization has made significant progress in recent years. This has been in part due to the success of DeepLearning architectures - specifically Convolutional Networks (ConvNets) =-=[21, 14, 22, 5]-=- - but also due to the availability of ever larger and more comprehensive datasets [1, 16, 20] (our model’s predictions for difficult examples from [1] are shown in Figure 1). A common characteristic ...

Efficient ConvNet-based Marker-less Motion Capture in General Scenes with a Low Number of Cameras

by A. Elhayek, Mpi Informatics, E. De Aguiar, Mpi Informatics, A. Jain, J. Tompson, L. Pishchulin, Mpi Informatics, M. Andriluka, C. Bregler, B. Schiele, C. Theobalt
"... We present a novel method for accurate marker-less capture of articulated skeleton motion of several subjects in general scenes, indoors and outdoors, even from input filmed with as few as two cameras. Our approach unites a discriminative image-based joint detection method with a model-based generat ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
We present a novel method for accurate marker-less capture of articulated skeleton motion of several subjects in general scenes, indoors and outdoors, even from input filmed with as few as two cameras. Our approach unites a discriminative image-based joint detection method with a model-based generative motion tracking algorithm through a combined pose optimization energy. The discriminative part-based pose detection method, implemented using Con-volutional Networks (ConvNet), estimates unary potentials for each joint of a kinematic skeleton model. These unary potentials are used to probabilistically extract pose con-straints for tracking by using weighted sampling from a pose posterior guided by the model. In the final energy, these constraints are combined with an appearance-based model-to-image similarity term. Poses can be computed very efficiently using iterative local optimization, as Con-vNet detection is fast, and our formulation yields a com-bined pose estimation energy with analytic derivatives. In combination, this enables to track full articulated joint an-gles at state-of-the-art accuracy and temporal stability with a very low number of cameras. 1.
(Show Context)

Citation Context

...ed and not learnt. Convolutional networks (ConvNets) are by far the best performing algorithms for many vision tasks. The state-ofthe-art methods for human-pose estimation are also based on ConvNets (=-=[40, 22, 39, 23, 11]-=-). Toshev et al. [40] formulate the problem as a direct regression to joint location. Chen et al. [11] improve over [40] by adding an image dependent spatial prior. Jain et al. [22] train an image pat...

Minding the gaps for block frank-wolfe optimization of structured svms. arXiv preprint arXiv:1605.09346,

by Anton Osokin , Jean-Baptiste Alayrac , Isabella Lukasewitz , Puneet K Dokania , Simon Lacoste-Julien , 2016
"... Abstract In this paper, we propose several improvements on the block-coordinate Frank-Wolfe (BCFW) algorithm from ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract In this paper, we propose several improvements on the block-coordinate Frank-Wolfe (BCFW) algorithm from
(Show Context)

Citation Context

...d in Section 3; Section 5.2 evaluates our approach on the regularization path estimation. Datasets. We evaluate our methods on four datasets for different structured prediction tasks: OCR (Taskar et al., 2003) for handwritten character recognition, CoNLL (Tjong Kim Sang & Buchholz, 2000) for text chunking, HorseSeg (Kolesnikov et al., 2014) for binary image segmentation and LSP (Johnson & Everingham, 2010) for pose estimation. The models for OCR and CoNLL were provided by Lacoste-Julien et al. (2013). We build our model based on the one by Kolesnikov et al. (2014) for HorseSeg, and the one by Chen & Yuille (2014) for LSP. For OCR and CoNLL, the max oracle consists of the Viterbi algorithm (Viterbi, 1967); for HorseSeg – in graph cut (Boykov & Kolmogorov, 2004), for LSP – in belief propagation on a tree with messages passed by a generalized distance transform (Felzenszwalb & Huttenlocher, 2005). Note that the oracles of HorseSeg and LSP require positivity constraints on a subset of the weights in order to be tractable. The BCFW algorithm with positivity constraints is derived in App. H. We provide a detailed description of the datasets in App. I with a summary in Table 1. The problems included in our e...

P-CNN: Pose-based CNN Features for Action Recognition

by Ivan Laptev, Cordelia Schmid, Ivan Laptev, Cordelia Schmid, P-cnn Pose-based, Cnn Features Action, Ivan Laptev, Cordelia Schmid , 2015
"... HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
(Show Context)

Citation Context

...mes of temporal aggregation. The extraction of proposed Posebased Convolutional Neural Network (P-CNN) features is illustrated in Figure 1. Pose estimation in natural images is still a difficult task =-=[7, 37, 42]-=-. In this paper we investigate P-CNN features both for automatically estimated as well as manually annotated human poses. We report experimental results for two challenging datasets: JHMDB [19], a sub...

Fine-grained classification of pedestrians in video benchmark and state-of-the-art

by David Hall, Pietro Perona , 2015
"... A video dataset that is designed to study fine-grained cat-egorisation of pedestrians is introduced. Pedestrians were recorded “in-the-wild ” from a moving vehicle. Annotations include bounding boxes, tracks, 14 keypoints with occlu-sion information and the fine-grained categories of age (5 classes) ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
A video dataset that is designed to study fine-grained cat-egorisation of pedestrians is introduced. Pedestrians were recorded “in-the-wild ” from a moving vehicle. Annotations include bounding boxes, tracks, 14 keypoints with occlu-sion information and the fine-grained categories of age (5 classes), sex (2 classes), weight (3 classes) and clothing style (4 classes). There are a total of 27,454 bounding box and pose labels across 4222 tracks. This dataset is designed to train and test algorithms for fine-grained categorisation of people; it is also useful for benchmarking tracking, detec-tion and pose estimation of pedestrians. State-of-the-art al-gorithms for fine-grained classification and pose estimation were tested using the dataset and the results are reported as a useful performance baseline. 1.
(Show Context)

Citation Context

...ined categorisation techniques rely on parts, it is important to look at pose estimation. To benchmark human pose estimation we used the state-of-theart, articulated pose estimator of Chen and Yuille =-=[8]-=-. This method extends Yang and Ramanan’s work [41] to use deep features. The code is publicly available. Two experiments were run using a single train/test split. In the first experiment, the pose mod...

CRF-CNN: Modeling Structured Information in Human Pose Estimation

by Xiao Chu , Wanli Ouyang , Hongsheng Li , Xiaogang Wang
"... Abstract Deep convolutional neural networks (CNN) have achieved great success. On the other hand, modeling structural information has been proved critical in many vision problems. It is of great interest to integrate them effectively. In a classical neural network, there is no message passing betwe ..."
Abstract - Add to MetaCart
Abstract Deep convolutional neural networks (CNN) have achieved great success. On the other hand, modeling structural information has been proved critical in many vision problems. It is of great interest to integrate them effectively. In a classical neural network, there is no message passing between neurons in the same layer. In this paper, we propose a CRF-CNN framework which can simultaneously model structural information in both output and hidden feature layers in a probabilistic way, and it is applied to human pose estimation. A message passing scheme is proposed, so that in various layers each body joint receives messages from all the others in an efficient way. Such message passing can be implemented with convolution between features maps in the same layer, and it is also integrated with feedforward propagation in neural networks. Finally, a neural network implementation of endto-end learning CRF-CNN is provided. Its effectiveness is demonstrated through experiments on two benchmark datasets.
(Show Context)

Citation Context

...volution between features maps in the same layer, and it is also integrated with feedforward propagation in neural networks. Finally, a neural network implementation of endto-end learning CRF-CNN is provided. Its effectiveness is demonstrated through experiments on two benchmark datasets. 1 Introduction A lot of efforts have been devoted to structure design of convolutional neural network (CNN). They can be divided into two groups. One is to achieve higher expressive power by making CNN deeper [19, 10, 20]. The other is to model structures among features and outputs, either as post processing [6, 2] or as extra information to guide the learning of CNN [29, 22, 24]. They are complementary. Human pose estimation is to estimate body joint locations from 2D images, which could be applied to assist other tasks such as [4, 14, 26] The very first attempt adopting CNN for human pose estimation is DeepPose [23]. It used CNN to regress joint locations repeatedly without directly modeling the output structure. However, the prediction of body joint locations relies both on their own appearance scores and the prediction of other joints. Hence, the output space for human pose estimation is structured....

Scene-Domain Active Part Models for Object Representation

by Zhou Ren, Chaohui Wang, Alan Yuille
"... In this paper, we are interested in enhancing the expres-sivity and robustness of part-based models for object repre-sentation, in the common scenario where the training data are based on 2D images. To this end, we propose scene-domain active part models (SDAPM), which reconstruct and characterize t ..."
Abstract - Add to MetaCart
In this paper, we are interested in enhancing the expres-sivity and robustness of part-based models for object repre-sentation, in the common scenario where the training data are based on 2D images. To this end, we propose scene-domain active part models (SDAPM), which reconstruct and characterize the 3D geometric statistics between ob-ject’s parts in 3D scene-domain by using 2D training data in the image-domain alone. And on top of this, we explicitly model and handle occlusions in SDAPM. Together with the developed learning and inference algorithms, such a model provides rich object descriptions, including 2D object and parts localization, 3D landmark shape and camera view-point, which offers an effective representation to various im-age understanding tasks, such as object and parts detection, 3D landmark shape and viewpoint estimation from images. Experiments on the above tasks show that SDAPM outper-forms previous part-based models, and thus demonstrates the potential of the proposed technique. 1.
(Show Context)

Citation Context

...ion. While in the experiments on 2D, 3D pose and viewpoint estimation, we construct our model based on the Mixture-of-Parts structure as [41] of 10 part types, based on HOG feature and CNN feature as =-=[7]-=-. We use the Caffe [23] to compute the CNN feature. When learning the scene-domain geometric subspace B, we follow the NRSFM techniques [9] to set the geometric subspace bases number K = 5 for object ...

Published as a conference paper at ICLR 2015 SEMANTIC IMAGE SEGMENTATION WITH DEEP CON- VOLUTIONAL NETS AND FULLY CONNECTED CRFS

by Liang-chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille
"... Deep Convolutional Neural Networks (DCNNs) have recently shown state of the art performance in high level vision tasks, such as image classification and ob-ject detection. This work brings together methods from DCNNs and probabilistic graphical models for addressing the task of pixel-level classific ..."
Abstract - Add to MetaCart
Deep Convolutional Neural Networks (DCNNs) have recently shown state of the art performance in high level vision tasks, such as image classification and ob-ject detection. This work brings together methods from DCNNs and probabilistic graphical models for addressing the task of pixel-level classification (also called ”semantic image segmentation”). We show that responses at the final layer of DCNNs are not sufficiently localized for accurate object segmentation. This is due to the very invariance properties that make DCNNs good for high level tasks. We overcome this poor localization property of deep networks by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF). Qualitatively, our “DeepLab ” system is able to localize segment boundaries at a level of accuracy which is beyond previous methods. Quantita-tively, our method sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 71.6 % IOU accuracy in the test set. We show how these results can be obtained efficiently: Careful network re-purposing and a novel application of the ’hole ’ algorithm from the wavelet community allow dense computation of neural net responses at 8 frames per second on a modern GPU. 1
(Show Context)

Citation Context

...ity to learn hierarchical abstractions of data (Zeiler & Fergus, 2014). While this invariance is clearly desirable for high-level vision tasks, it can hamper low-level tasks, such as pose estimation (=-=Chen & Yuille, 2014-=-; Tompson et al., 2014) and semantic segmentation - where we want precise localization, rather than abstraction of spatial details. There are two technical hurdles in the application of DCNNs to image...

Human Pose Estimation in Videos

by Dong Zhang, Mubarak Shah
"... In this paper, we present a method to estimate a sequence of human poses in unconstrained videos. In contrast to the commonly employed graph optimization framework, which is NP-hard and needs approximate solutions, we formulate this problem into a unified two stage tree-based optimization problem fo ..."
Abstract - Add to MetaCart
In this paper, we present a method to estimate a sequence of human poses in unconstrained videos. In contrast to the commonly employed graph optimization framework, which is NP-hard and needs approximate solutions, we formulate this problem into a unified two stage tree-based optimization problem for which an efficient and exact solution exists. Although the proposed method finds an exact solution, it does not sacrifice the ability to model the spatial and temporal constraints between body parts in the video frames; indeed it even models the symmetric parts better than the existing methods. The proposed method is based on two main ideas: ‘Abstraction’ and ‘Association ’ to enforce the intra- and inter-frame body part constraints respectively without inducing extra computational complexity to the polynomial time solution. body part ’ is introduced to model not only the tree based body part structure similar to existing methods, but also extra constraints between symmetric parts. Using the idea of ‘Association’, the optimal tracklets are generated for each abstract body part, in order to enforce the spatiotemporal constraints between body parts in adjacent frames. Finally, a sequence of the best poses is inferred from the abstract body part tracklets through the tree-based optimization. We evaluated the proposed method on three publicly available video based human pose estimation datasets, and obtained dramatically improved performance compared to the state-of-the-art methods. 1.
(Show Context)

Citation Context

... spatial interactions among body parts. A novel, non-linear joint regressor model was proposed in [6], which handles typical ambiguities of tree based models quite well. More recently, deep learning (=-=[36, 35, 18, 4]-=-) has also been introduced for human pose estimation. For video based human pose estimation in unconstrained scenes, some early research adopted the tracking-by-detection framework ([1, 17, 25]). More...

HICO: A Benchmark for Recognizing Human-Object Interactions in Images

by Yu-wei Chao, Zhan Wang, Yugeng He, Jiaxuan Wang, Jia Deng
"... We introduce a new benchmark “Humans Interacting with Common Objects ” (HICO) for recognizing human-object interactions (HOI). We demonstrate the key features of HICO: a diverse set of interactions with common ob-ject categories, a list of well-defined, sense-based HOI cat-egories, and an exhaustive ..."
Abstract - Add to MetaCart
We introduce a new benchmark “Humans Interacting with Common Objects ” (HICO) for recognizing human-object interactions (HOI). We demonstrate the key features of HICO: a diverse set of interactions with common ob-ject categories, a list of well-defined, sense-based HOI cat-egories, and an exhaustive labeling of co-occurring inter-actions with an object category in each image. We perform an in-depth analysis of representative current approaches and show that DNNs enjoy a significant edge. In addition, we show that semantic knowledge can significantly improve HOI recognition, especially for uncommon categories. 1.
(Show Context)

Citation Context

...bear? Without an accurate understanding of the interaction, we will not be able to generate informative image descriptions besides a bag of objects. Despite significant advances in recognizing humans =-=[2]-=- and objects [14], the state of the art of HOI recognition in images is still far from the demands of real-world applications. A key bottleneck is the limited number of HOI cate1In this paper, we will...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University