Results 1 - 10
of
13
Efficient Object Localization Using Convolutional Networks. arXiv preprint
, 2015
"... Recent state-of-the-art performance on human-body pose estimation has been achieved with Deep Convolu-tional Networks (ConvNets). Traditional ConvNet architec-tures include pooling layers which reduce computational re-quirements, introduce invariance and prevent over-training. These benefits of pool ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
Recent state-of-the-art performance on human-body pose estimation has been achieved with Deep Convolu-tional Networks (ConvNets). Traditional ConvNet architec-tures include pooling layers which reduce computational re-quirements, introduce invariance and prevent over-training. These benefits of pooling come at the cost of reduced lo-calization accuracy. We introduce a novel architecture which includes an efficient ‘position refinement ’ model that is trained to estimate the joint offset location within a small region of the image. This refinement model is jointly trained in cascade with a state-of-the-art ConvNet model [21] to achieve improved accuracy in human joint loca-tion estimation. We show that the variance of our detec-tor approaches the variance of human annotations on the FLIC [20] dataset and outperforms all existing approaches on the MPII-human-pose dataset [1]. 1.
Efficient ConvNet-based Marker-less Motion Capture in General Scenes with a Low Number of Cameras
"... We present a novel method for accurate marker-less capture of articulated skeleton motion of several subjects in general scenes, indoors and outdoors, even from input filmed with as few as two cameras. Our approach unites a discriminative image-based joint detection method with a model-based generat ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
We present a novel method for accurate marker-less capture of articulated skeleton motion of several subjects in general scenes, indoors and outdoors, even from input filmed with as few as two cameras. Our approach unites a discriminative image-based joint detection method with a model-based generative motion tracking algorithm through a combined pose optimization energy. The discriminative part-based pose detection method, implemented using Con-volutional Networks (ConvNet), estimates unary potentials for each joint of a kinematic skeleton model. These unary potentials are used to probabilistically extract pose con-straints for tracking by using weighted sampling from a pose posterior guided by the model. In the final energy, these constraints are combined with an appearance-based model-to-image similarity term. Poses can be computed very efficiently using iterative local optimization, as Con-vNet detection is fast, and our formulation yields a com-bined pose estimation energy with analytic derivatives. In combination, this enables to track full articulated joint an-gles at state-of-the-art accuracy and temporal stability with a very low number of cameras. 1.
Minding the gaps for block frank-wolfe optimization of structured svms. arXiv preprint arXiv:1605.09346,
, 2016
"... Abstract In this paper, we propose several improvements on the block-coordinate Frank-Wolfe (BCFW) algorithm from ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract In this paper, we propose several improvements on the block-coordinate Frank-Wolfe (BCFW) algorithm from
P-CNN: Pose-based CNN Features for Action Recognition
, 2015
"... HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Fine-grained classification of pedestrians in video benchmark and state-of-the-art
, 2015
"... A video dataset that is designed to study fine-grained cat-egorisation of pedestrians is introduced. Pedestrians were recorded “in-the-wild ” from a moving vehicle. Annotations include bounding boxes, tracks, 14 keypoints with occlu-sion information and the fine-grained categories of age (5 classes) ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
A video dataset that is designed to study fine-grained cat-egorisation of pedestrians is introduced. Pedestrians were recorded “in-the-wild ” from a moving vehicle. Annotations include bounding boxes, tracks, 14 keypoints with occlu-sion information and the fine-grained categories of age (5 classes), sex (2 classes), weight (3 classes) and clothing style (4 classes). There are a total of 27,454 bounding box and pose labels across 4222 tracks. This dataset is designed to train and test algorithms for fine-grained categorisation of people; it is also useful for benchmarking tracking, detec-tion and pose estimation of pedestrians. State-of-the-art al-gorithms for fine-grained classification and pose estimation were tested using the dataset and the results are reported as a useful performance baseline. 1.
CRF-CNN: Modeling Structured Information in Human Pose Estimation
"... Abstract Deep convolutional neural networks (CNN) have achieved great success. On the other hand, modeling structural information has been proved critical in many vision problems. It is of great interest to integrate them effectively. In a classical neural network, there is no message passing betwe ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract Deep convolutional neural networks (CNN) have achieved great success. On the other hand, modeling structural information has been proved critical in many vision problems. It is of great interest to integrate them effectively. In a classical neural network, there is no message passing between neurons in the same layer. In this paper, we propose a CRF-CNN framework which can simultaneously model structural information in both output and hidden feature layers in a probabilistic way, and it is applied to human pose estimation. A message passing scheme is proposed, so that in various layers each body joint receives messages from all the others in an efficient way. Such message passing can be implemented with convolution between features maps in the same layer, and it is also integrated with feedforward propagation in neural networks. Finally, a neural network implementation of endto-end learning CRF-CNN is provided. Its effectiveness is demonstrated through experiments on two benchmark datasets.
Scene-Domain Active Part Models for Object Representation
"... In this paper, we are interested in enhancing the expres-sivity and robustness of part-based models for object repre-sentation, in the common scenario where the training data are based on 2D images. To this end, we propose scene-domain active part models (SDAPM), which reconstruct and characterize t ..."
Abstract
- Add to MetaCart
(Show Context)
In this paper, we are interested in enhancing the expres-sivity and robustness of part-based models for object repre-sentation, in the common scenario where the training data are based on 2D images. To this end, we propose scene-domain active part models (SDAPM), which reconstruct and characterize the 3D geometric statistics between ob-ject’s parts in 3D scene-domain by using 2D training data in the image-domain alone. And on top of this, we explicitly model and handle occlusions in SDAPM. Together with the developed learning and inference algorithms, such a model provides rich object descriptions, including 2D object and parts localization, 3D landmark shape and camera view-point, which offers an effective representation to various im-age understanding tasks, such as object and parts detection, 3D landmark shape and viewpoint estimation from images. Experiments on the above tasks show that SDAPM outper-forms previous part-based models, and thus demonstrates the potential of the proposed technique. 1.
Published as a conference paper at ICLR 2015 SEMANTIC IMAGE SEGMENTATION WITH DEEP CON- VOLUTIONAL NETS AND FULLY CONNECTED CRFS
"... Deep Convolutional Neural Networks (DCNNs) have recently shown state of the art performance in high level vision tasks, such as image classification and ob-ject detection. This work brings together methods from DCNNs and probabilistic graphical models for addressing the task of pixel-level classific ..."
Abstract
- Add to MetaCart
(Show Context)
Deep Convolutional Neural Networks (DCNNs) have recently shown state of the art performance in high level vision tasks, such as image classification and ob-ject detection. This work brings together methods from DCNNs and probabilistic graphical models for addressing the task of pixel-level classification (also called ”semantic image segmentation”). We show that responses at the final layer of DCNNs are not sufficiently localized for accurate object segmentation. This is due to the very invariance properties that make DCNNs good for high level tasks. We overcome this poor localization property of deep networks by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF). Qualitatively, our “DeepLab ” system is able to localize segment boundaries at a level of accuracy which is beyond previous methods. Quantita-tively, our method sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 71.6 % IOU accuracy in the test set. We show how these results can be obtained efficiently: Careful network re-purposing and a novel application of the ’hole ’ algorithm from the wavelet community allow dense computation of neural net responses at 8 frames per second on a modern GPU. 1
Human Pose Estimation in Videos
"... In this paper, we present a method to estimate a sequence of human poses in unconstrained videos. In contrast to the commonly employed graph optimization framework, which is NP-hard and needs approximate solutions, we formulate this problem into a unified two stage tree-based optimization problem fo ..."
Abstract
- Add to MetaCart
(Show Context)
In this paper, we present a method to estimate a sequence of human poses in unconstrained videos. In contrast to the commonly employed graph optimization framework, which is NP-hard and needs approximate solutions, we formulate this problem into a unified two stage tree-based optimization problem for which an efficient and exact solution exists. Although the proposed method finds an exact solution, it does not sacrifice the ability to model the spatial and temporal constraints between body parts in the video frames; indeed it even models the symmetric parts better than the existing methods. The proposed method is based on two main ideas: ‘Abstraction’ and ‘Association ’ to enforce the intra- and inter-frame body part constraints respectively without inducing extra computational complexity to the polynomial time solution. body part ’ is introduced to model not only the tree based body part structure similar to existing methods, but also extra constraints between symmetric parts. Using the idea of ‘Association’, the optimal tracklets are generated for each abstract body part, in order to enforce the spatiotemporal constraints between body parts in adjacent frames. Finally, a sequence of the best poses is inferred from the abstract body part tracklets through the tree-based optimization. We evaluated the proposed method on three publicly available video based human pose estimation datasets, and obtained dramatically improved performance compared to the state-of-the-art methods. 1.
HICO: A Benchmark for Recognizing Human-Object Interactions in Images
"... We introduce a new benchmark “Humans Interacting with Common Objects ” (HICO) for recognizing human-object interactions (HOI). We demonstrate the key features of HICO: a diverse set of interactions with common ob-ject categories, a list of well-defined, sense-based HOI cat-egories, and an exhaustive ..."
Abstract
- Add to MetaCart
(Show Context)
We introduce a new benchmark “Humans Interacting with Common Objects ” (HICO) for recognizing human-object interactions (HOI). We demonstrate the key features of HICO: a diverse set of interactions with common ob-ject categories, a list of well-defined, sense-based HOI cat-egories, and an exhaustive labeling of co-occurring inter-actions with an object category in each image. We perform an in-depth analysis of representative current approaches and show that DNNs enjoy a significant edge. In addition, we show that semantic knowledge can significantly improve HOI recognition, especially for uncommon categories. 1.