Results 1 - 10
of
26
Realtime and Robust Hand Tracking from Depth
"... We present a realtime hand tracking system using a depth sensor. It tracks a fully articulated hand under large view-points in realtime (25 FPS on a desktop without using a GPU) and with high accuracy (error below 10 mm). To our knowledge, it is the first system that achieves such ro-bustness, accur ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
(Show Context)
We present a realtime hand tracking system using a depth sensor. It tracks a fully articulated hand under large view-points in realtime (25 FPS on a desktop without using a GPU) and with high accuracy (error below 10 mm). To our knowledge, it is the first system that achieves such ro-bustness, accuracy, and speed simultaneously, as verified on challenging real data. Our system is made of several novel techniques. We mod-el a hand simply using a number of spheres and define a fast cost function. Those are critical for realtime performance. We propose a hybrid method that combines gradient based and stochastic optimization methods to achieve fast conver-gence and good accuracy. We present new finger detection and hand initialization methods that greatly enhance the ro-bustness of tracking. 1.
In-air Gestures Around Unmodified Mobile Devices
"... Figure 1: Touch input is expressive but can occlude large parts of the screen (A). We propose a machine learning based algorithm for gesture recognition expanding the interaction space around the mobile device (B), adding in-air gestures and hand-part tracking (D) to commodity off-the-shelf mobile d ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
Figure 1: Touch input is expressive but can occlude large parts of the screen (A). We propose a machine learning based algorithm for gesture recognition expanding the interaction space around the mobile device (B), adding in-air gestures and hand-part tracking (D) to commodity off-the-shelf mobile devices, relying only on the device’s camera (and no hardware modifications). We demonstrate a number of compelling interactive scenarios including bi-manual input to mapping and gaming applications (C+D). The algorithm runs in real time and can even be used on ultra-mobile devices such as smartwatches (E). We present a novel machine learning based algorithm ex-tending the interaction space around mobile devices. The technique uses only the RGB camera now commonplace on off-the-shelf mobile devices. Our algorithm robustly recog-nizes a wide range of in-air gestures, supporting user varia-tion, and varying lighting conditions. We demonstrate that our algorithm runs in real-time on unmodified mobile devices, in-cluding resource-constrained smartphones and smartwatches. Our goal is not to replace the touchscreen as primary input device, but rather to augment and enrich the existing interac-tion vocabulary using gestures. While touch input works well for many scenarios, we demonstrate numerous interaction tasks such as mode switches, application and task manage-ment, menu selection and certain types of navigation, where such input can be either complemented or better served by in-air gestures. This removes screen real-estate issues on small touchscreens, and allows input to be expanded to the 3D space around the device. We present results for recognition accuracy (93 % test and 98 % train), impact of memory footprint and other model parameters. Finally, we report results from pre-liminary user evaluations, discuss advantages and limitations and conclude with directions for future work. Author Keywords HCI; mobile interaction; mobile gestures; gesture recognition; mobile computing; random forests
Latent-Class Hough Forests for 3D Object Detection and Pose Estimation
"... Abstract. In this paper we propose a novel framework, Latent-Class Hough Forests, for 3D object detection and pose estimation in heavily cluttered and oc-cluded scenes. Firstly, we adapt the state-of-the-art template matching feature, LINEMOD [14], into a scale-invariant patch descriptor and integra ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
Abstract. In this paper we propose a novel framework, Latent-Class Hough Forests, for 3D object detection and pose estimation in heavily cluttered and oc-cluded scenes. Firstly, we adapt the state-of-the-art template matching feature, LINEMOD [14], into a scale-invariant patch descriptor and integrate it into a regression forest using a novel template-based split function. In training, rather than explicitly collecting representative negative samples, our method is trained on positive samples only and we treat the class distributions at the leaf nodes as latent variables. During the inference process we iteratively update these dis-tributions, providing accurate estimation of background clutter and foreground occlusions and thus a better detection rate. Furthermore, as a by-product, the la-tent class distributions can provide accurate occlusion aware segmentation masks, even in the multi-instance scenario. In addition to an existing public dataset, which contains only single-instance sequences with large amounts of clutter, we have collected a new, more challenging, dataset for multiple-instance detection containing heavy 2D and 3D clutter as well as foreground occlusions. We evalu-ate the Latent-Class Hough Forest on both of these datasets where we outperform state-of-the art methods. 1
Real-time Hand Tracking Using a Sum of Anisotropic Gaussians Model
"... Real-time marker-less hand tracking is of increasing im-portance in human-computer interaction. Robust and ac-curate tracking of arbitrary hand motion is a challenging problem due to the many degrees of freedom, frequent self-occlusions, fast motions, and uniform skin color. In this pa-per, we propo ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Real-time marker-less hand tracking is of increasing im-portance in human-computer interaction. Robust and ac-curate tracking of arbitrary hand motion is a challenging problem due to the many degrees of freedom, frequent self-occlusions, fast motions, and uniform skin color. In this pa-per, we propose a new approach that tracks the full skeleton motion of the hand from multiple RGB cameras in real-time. The main contributions include a new generative tracking method which employs an implicit hand shape representa-tion based on Sum of Anisotropic Gaussians (SAG), and a pose fitting energy that is smooth and analytically differen-tiable making fast gradient based pose optimization possi-ble. This shape representation, together with a full perspec-tive projection model, enables more accurate hand mod-eling than a related baseline method from literature. Our method achieves better accuracy than previous methods and runs at 25 fps. We show these improvements both qualita-tively and quantitatively on publicly available datasets. 1.
Body Parts Dependent Joint Regressors for Human Pose Estimation in Still Images
"... Abstract—In this work, we address the problem of estimating 2d human pose from still images. Articulated body pose estimation is challenging due to the large variation in body poses and appearances of the different body parts. Recent methods that rely on the pictorial structure framework have shown ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Abstract—In this work, we address the problem of estimating 2d human pose from still images. Articulated body pose estimation is challenging due to the large variation in body poses and appearances of the different body parts. Recent methods that rely on the pictorial structure framework have shown to be very successful in solving this task. They model the body part appearances using discriminatively trained, independent part templates and the spatial relations of the body parts using a tree model. Within such a framework, we address the problem of obtaining better part templates which are able to handle a very high variation in appearance. To this end, we introduce parts dependent body joint regressors which are random forests that operate over two layers. While the first layer acts as an independent body part classifier, the second layer takes the estimated class distributions of the first one into account and is thereby able to predict joint locations by modeling the interdependence and co-occurrence of the parts. This helps to overcome typical ambiguities of tree structures, such as self-similarities of legs and arms. In addition, we introduce a novel dataset termed FashionPose that contains over 7, 000 images with a challenging variation of body part appearances due to a large variation of dressing styles. In the experiments, we demonstrate that the proposed parts dependent joint regressors outperform independent classifiers or regressors. The method also performs better or similar to the state-of-the-art in terms of accuracy, while running with a couple of frames per second. Index Terms—Human pose estimation, fashion, random forest, regression, classification F 1
Cascaded hand pose regression
- In CVPR
"... We extends the previous 2D cascaded object pose regres-sion work [9] in two aspects so that it works better for 3D articulated objects. Our first contribution is 3D pose-indexed features that generalize the previous 2D parame-terized features and achieve better invariance to 3D trans-formations. Our ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
We extends the previous 2D cascaded object pose regres-sion work [9] in two aspects so that it works better for 3D articulated objects. Our first contribution is 3D pose-indexed features that generalize the previous 2D parame-terized features and achieve better invariance to 3D trans-formations. Our second contribution is a principled hier-archical regression that is adapted to the articulated object structure. It is therefore more accurate and faster. Com-prehensive experiments verify the state-of-the-art accuracy and efficiency of the proposed approach on the challenging 3D hand pose estimation problem, on a public dataset and our new dataset. 1.
Depth-based hand pose estimation: data, methods, and challenges
"... Hand pose estimation has matured rapidly in recent years. The introduction of commodity depth sensors and a multitude of practical applications have spurred new ad-vances. We provide an extensive analysis of the state-of-the-art, focusing on hand pose estimation from a single depth frame. To do so, ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Hand pose estimation has matured rapidly in recent years. The introduction of commodity depth sensors and a multitude of practical applications have spurred new ad-vances. We provide an extensive analysis of the state-of-the-art, focusing on hand pose estimation from a single depth frame. To do so, we have implemented a considerable num-ber of systems, and will release all software and evaluation code. We summarize important conclusions here: (1) Pose estimation appears roughly solved for scenes with isolated hands. However, methods still struggle to analyze cluttered scenes where hands may be interacting with nearby ob-jects and surfaces. To spur further progress we introduce a challenging new dataset with diverse, cluttered scenes. (2) Many methods evaluate themselves with disparate cri-teria, making comparisons difficult. We define a consistent evaluation criteria, rigorously motivated by human experi-ments. (3) We introduce a simple nearest-neighbor baseline that outperforms most existing systems. This implies that most systems do not generalize beyond their training sets. This also reinforces the under-appreciated point that train-ing data is as important as the model itself. We conclude with directions for future progress. 1.
Opening the Black Box: Hierarchical Sampling Optimization for Estimating Human Hand Pose
"... We address the problem of hand pose estimation, formu-lated as an inverse problem. Typical approaches optimize an energy function over pose parameters using a ‘black box ’ image generation procedure. This procedure knows lit-tle about either the relationships between the parameters or the form of th ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We address the problem of hand pose estimation, formu-lated as an inverse problem. Typical approaches optimize an energy function over pose parameters using a ‘black box ’ image generation procedure. This procedure knows lit-tle about either the relationships between the parameters or the form of the energy function. In this paper, we show that we can significantly improve upon black box optimization by exploiting high-level knowledge of the parameter struc-ture and using a local surrogate energy function. Our new framework, hierarchical sampling optimization, consists of a sequence of predictors organized into a kinematic hier-archy. Each predictor is conditioned on its ancestors, and generates a set of samples over a subset of the pose pa-rameters. The highly-efficient surrogate energy is used to select among samples. Having evaluated the full hierar-chy, the partial pose samples are concatenated to generate a full-pose hypothesis. Several hypotheses are generated us-ing the same procedure, and finally the original full energy function selects the best result. Experimental evaluation on three publically available datasets show that our method is particularly impressive in low-compute scenarios where it significantly outperforms all other state-of-the-art methods. 1.
Structured Semi-supervised Forest for Facial Landmarks Localization with Face Mask Reasoning
"... Despite the great success of recent facial landmarks localization approaches, the pres-ence of occlusions significantly degrades the performance of the systems. However, very few works have addressed this problem explicitly due to the high diversity of occlusion in real world. In this paper, we addr ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Despite the great success of recent facial landmarks localization approaches, the pres-ence of occlusions significantly degrades the performance of the systems. However, very few works have addressed this problem explicitly due to the high diversity of occlusion in real world. In this paper, we address the face mask reasoning and facial landmarks lo-calization in an unified Structured Decision Forests framework. We first assign a portion of the face dataset with face masks, i.e., for each face image we give each pixel a label to indicate whether it belongs to the face or not. Then we incorporate such additional infor-mation of dense pixel labelling into training the Structured Classification-Regression De-cision Forest. The classification nodes aim at decreasing the variance of the pixel labels of the patches by using our proposed structured criterion while the regression nodes aim at decreasing the variance of the displacements between the patches and the facial land-marks. The proposed framework allows us to predict the face mask and facial landmarks locations jointly. We test the model on face images from several datasets with significant occlusion. The proposed method 1) yields promising results in face mask reasoning; 2) improves the existing Decision Forests approaches in facial landmark localization, aided by the face mask reasoning. 1